I’m trying to analyze UK Biobank data with Hail, and I’m getting output from hl.split_multi_hts(mt) that doesn’t make sense to me. I was looking more carefully at the entries for the variant depicted below and noticed that the mt.variant_qc.AC fields don’t make sense to me.
For the G:A allele, the allele counts are [939595, 5], and for the G:T allele, they are [817215, 122385] (see below for snapshot of the mt.rows().show() output). Shouldn’t the allele count for G be the same in the two rows, or am I missing something?
Thank you, Dan! I think I got confused because it seems that after splitting, for example, for the G/A genotype, the number of reference (G) alleles was actually the sum of the number of (G) alleles + the other alternate (T) alleles. This way it could maintain the total number of alleles (AN) and allele frequency (AF) for the A allele.
The way you suggest to annotate the rows makes sense!