I want to annotate a row field that only contains allele balance of Hets, and then i want to annotate more row fields like “minHetAB”, “maxHetAB” and “medianHetAB”.
Any updates @tpoterba
Hi! Not sure I fully understand, but perhaps the following could be helpful?
Annotate each entry with allele balance AB, based on its AD field, allowing for multiallelics. Genotypes which are not hets get AB=null, others get relative AD of the less covered allele. (I suppose one can still worry about nans from 0/0.)
GT_AD = hl.enumerate( mt.GT.one_hot_alleles(hl.len(mt.alleles)) ).filter( lambda _: _ > 0 ).map( lambda _: mt.AD[_] ) mt = mt.annotate_entries( HetAB = hl.case().when( mt.GT.is_het(), hl.min(mt.GT_AD) / hl.sum(mt.GT_AD) ).or_missing() )
Given those entry fields one could aggregate them over columns, which - as far as I understand would simply omit the null ABs:
mt = mt.annotate_rows( minHetAB = hl.agg.min(mt.HetAB), maxHetAB = hl.agg.max(mt.HetAB), )
In this approach, one would expect maxHetAB to be near its largest allowed value of 0.5 for all rows where its not missing.