Split_multi_hts AD field

Hi, I’m trying to split my MT generated by DRAGEN into a diallelic MT. I’m facing the issue that the AD field contains only a single entry for hom-ref samples, but non-hom-ref samples contain N entries where N is the number of alleles per site. Example:

Column 1 Alleles Sample1 Sample2 Sample3
chr1:10000 [“A”, “G”, “C”] 0/0:30 0/1:0,30,0 1/2:0,15,15

So what I would need is the following:

Column 1 Alleles Sample1 Sample2 Sample3
chr1:10000 [“A”, “G”, “C”] 0/0:30,0,0 0/1:0,30,0 1/2:0,15,15

I’ve tried to create a new AD field like this where each entry contains exactly N elements in the AD list:

mt_annot = mt.annotate_entries(AD = hl.if_else(mt.GT.is_hom_ref(), (mt.AD.append(0) for entry in range(mt.GT.n_alt_alleles())), mt.AD))

But I get the following error:

TypeError: 'Int32Expression' object cannot be interpreted as an integer

I’m struggling with the part where I need to add “0” as many times as there are n_alt_alleles, how could I achieve that?

As a general rule of thumb, python loops and comprehensions almost never mix the way you want with hail code. The way I would append n_alt_alleles zeroes is

mt.AD.extend(hl.range(mt.GT.n_alt_alleles().map(lambda x: 0)))

@chrisvittal Are you familiar with this issue with DRAGEN generated datasets?

Thanks for the super speedy reply!
I’m not very proficient with Python, so I just tried out a bunch of options and neither of them worked.

I now get the following error:

AttributeError: 'Int32Expression' object has no attribute 'map'

Edit:
I think I actually need mt.alleles instead of mt.GT.n_alt_alleles.

Whoops, sorry, parentheses typo! That should be

mt.AD.extend(hl.range(mt.GT.n_alt_alleles()).map(lambda x: 0))

Edit:
You’re right, should really be

mt.AD.extend(hl.range(mt.alleles.length() - 1).map(lambda x: 0))

Perfect, this works! Thanks a lot. I can never figure out when to use hl.len(mt.alleles) vs mt.alleles.length() :expressionless:

Either one, they’re the same!

1 Like

I haven’t seen this before, no.

@DBScan how was this dataset generated?

Hi @chrisvittal , I’ve used DRAGEN iterative gVCF Genotyper 4.2 with the following options:

--gg-discard-ac-zero true 
--gg-remove-nonref true