ABHet ratio for variant level QC

jjfarrell · September 21, 2021, 1:53pm

A very useful metric for variant QC is the ABHet ratio. Variants with ABHet < 0.2 and > 0.8 are used to filter out false positives. More conservative cutoffs (0.25 and 0.75) are also used.

ABHet is a variant-level annotation that aims to estimate if biallelic variants match expected allelic ratios. An ideal heterozygous variant will have a value of close to 0.5 and an ideal homozygous variant will have a value of close to 1.0. ABHet is calculated for a variant based on all samples in the VCF which are not homozygous at this site.

ABHet=# REF reads from heterozygous samples/# REF + ALT reads from heterozygous samples

What would be the best approach for calculating this with Hail? Could this be added to the variantQC method?

kumarveerapen · September 21, 2021, 4:13pm

Thank you for your question, @jjfarrell ! II do also use allele balances as a QC metric for my datasets.

I do have a snippet that I use when analysing sequencing data which can be found here.

#create an allele balance annotation
mt= mt.annotate_entries(AB = (mt.AD[1] / hl.sum(mt.AD) ))

#set filter condition for AB. So if you want to be less or more stringent you can pick your balances accordingly. I've included both hom-ref and hom-alt in my filtering step here
filter_condition_ab = ((mt.GT.is_hom_ref() & (mt.AB <= 0.1)) |
                        (mt.GT.is_het() & (mt.AB >= 0.25) & (mt.AB <= 0.75)) |
                        (mt.GT.is_hom_var() & (mt.AB >= 0.9)))
fraction_filtered = mt.aggregate_entries(hl.agg.fraction(~filter_condition_ab))
print(f'Filtering {fraction_filtered * 100:.2f}% entries out of downstream analysis.')

mt = mt.filter_entries( (mt.GQ>=20) &
                 (mt.GQ >= 10) &
                 ((mt.GT.is_hom_ref() & (mt.AB <= 0.1)) |
                        (mt.GT.is_het() & (mt.AB >= 0.25) & (mt.AB <= 0.75)) |
                        (mt.GT.is_hom_var() & (mt.AB >= 0.9))))

I hope that this is helpful

kumarveerapen · September 21, 2021, 4:21pm

And to note, our developers have been flagged to see if we can include this into a future iteration of Hail.

tpoterba · September 21, 2021, 6:43pm

@kumarveerapen I think the solution we want here is a variant-level metric which is the sum of ref reads in het samples divided by the sum of alt reads in het samples.

We probably want this to compute a ratio per alternate allele, actually.

jjfarrell · September 22, 2021, 3:35pm

@tpoterba @kumarveerapen

yes, we typically use this at the variant level rather than the GT level as Tim suggests. Two variant level metrics could be defined: An AS_ABHet (Allele Specific ABHet) and an ABHet (defined as ratio of all #Alt_reads to #homozygous reference reads). We run variantQC after splitting the multiallelic so either would work.

Topic		Replies	Views
How can i find the allele balance of only Heterozygous or Homozygous Hail Query & hailctl	2	415	January 4, 2021
Hl.variant_qc and the FT field Hail Query & hailctl	4	456	August 16, 2023
[Breaking change] Variant QC changes to support multiallelics Updates	4	815	July 6, 2018
Variant qc AF field index meaning Hail Query & hailctl	5	1001	July 27, 2023
Inbreeding Calculation Run Time Hail Query & hailctl	0	197	July 28, 2023

ABHet ratio for variant level QC

Related topics