I would like to annotate rows in my matrixtable based on their genotype (GT). For instance, I would like to have the AD_mean for hom_var and the AD_mean for het.
AD is an entry with an array of size 2, for instance [20,20].
Currently I have the following:
what is the definition of “AD_mean for hom_var”? How should AD arrays of [20,20] and [15,10] be aggregated across two samples to a single mean? Do you want the mean of [20, 20, 15, 10] or the mean of [20, 10] or something else?
Isn’t mt.DP == hl.sum(mt.AD) by definition of the VCF spec?
What are you trying to do scientifically? Do you want to know the mean number of reads supporting the reference allele and the mean number of reads supporting the alternate allele, at each site? That would be this: