Annotate rows based on GT

DBScan · December 1, 2022, 10:39am

I would like to annotate rows in my matrixtable based on their genotype (GT). For instance, I would like to have the AD_mean for hom_var and the AD_mean for het.
AD is an entry with an array of size 2, for instance [20,20].
Currently I have the following:

mt = mt.annotate_rows(AD_mean_homvar =  hl.agg.filter(mt.GT.is_hom_var(), hl.agg.mean(mt.AD)))

This doesn’t work because AD is an array and not a float.
Changing hl.agg.mean(mt.AD) to hl.mean(mt.AD) also doesn’t work.

tpoterba · December 1, 2022, 3:20pm

what is the definition of “AD_mean for hom_var”? How should AD arrays of [20,20] and [15,10] be aggregated across two samples to a single mean? Do you want the mean of [20, 20, 15, 10] or the mean of [20, 10] or something else?

DBScan · December 2, 2022, 9:26am

I guess I figured it out:

I would like to get the total AD for every single sample which I do like this (so a single value per sample):

mt = annotate_entries(AD_tot = hl.sum(mt.AD)

For instance Sample1 [10,10] would be 20, Sample2[0,20] would be 20 as well.

Currently I just do the following:

AD_mean_homalt = hl.agg.filter(mt.GT.is_hom_var(),hl.agg.mean(mt.AD_tot))

danking · December 2, 2022, 3:30pm

Isn’t mt.DP == hl.sum(mt.AD) by definition of the VCF spec?

What are you trying to do scientifically? Do you want to know the mean number of reads supporting the reference allele and the mean number of reads supporting the alternate allele, at each site? That would be this:

mt = mt.annotate_rows(
    mean_ADs = hl.agg.array_agg(hl.agg.mean, mt.AD)
)

This produces a new annotation, meanADs, which is an array. meanADs[0] is the mean of mt.AD[0] across every sample at this site.

If all your variants are biallelic you could also write this:

mt = mt.annotate_rows(
    mean_ref_AD = hl.agg.mean(mt.AD[0]),
    mean_alt_AD = hl.agg.mean(mt.AD[1])
)

DBScan · December 7, 2022, 10:21am

Usually mt.DP == hl.sum(mt.AD) is not the same, since DP includes both supportive and non-supportive reads.

Topic		Replies	Views
Error when trying to annotate a new row with a genotypes of the sample Hail Query & hailctl	2	329	July 13, 2023
Annotate variants with hom var samples Hail Query & hailctl	0	340	January 19, 2023
Annotate row with sample information based on entry criteria Hail Query & hailctl	2	441	November 13, 2022
Add row annotation with label based on entry field of one sample Hail Query & hailctl	2	486	February 11, 2022
Aggregate GT over rows Hail Query & hailctl	2	206	January 9, 2024

Annotate rows based on GT

Related topics