Call Rate NAs after GT revising

zhouhufeng · October 27, 2021, 3:39pm

I found a problem of my hail QC pipeline,
if I revised GT using the following scripts,

vds = vds.filter_entries((vds.DP > 400)|(vds.DP < 10), keep = False)
vds = hl.variant_qc(vds)

and then calculate the call rate, as below,

callratelist = [hl.agg.filter(vds.Pop == pop, hl.agg.fraction(hl.is_defined(vds.GT))) for pop in custompops]
vds = vds.annotate_rows(
   callrate=hl.Struct(**dict(zip(custompops, callratelist))),
   lowestcallrate = hl.min(hl.array(callratelist))
)

Then all the original missed call genotype will be labeled as NA and will not be count in the call rate calculation. I am wondering if there is any way that we can improve the methods by forcing the call rate calculation by counting both the original missing in original genotype and the revised newly reset missing in the genotype?

Thanks very much.

tpoterba · October 27, 2021, 4:00pm

I think I understand the question.

There are three genotype states:

Called
Missing
Filtered (this applies to an entire entry, not just the genotype).

Call rate is defined as (called) / (called + missing + filtered), which is identical to (called) / (n_samples). It’s not possible to implement this definition using hl.agg.filter as you have, because filtered entries are not aggregated over.

Another way to implement your filter that might give you the results you want is the following:

vds = vds.annotate_entries(GT = hl.or_missing(~((vds.DP > 400)|(vds.DP < 10)), vds.GT)

This line conditionally sets the value of GT to missing when the condition is true.

zhouhufeng · October 28, 2021, 2:32pm

Dear Tim:
Thank you so much. That is really helpful.

Topic		Replies	Views
Call Rate after GT revising Hail Query & hailctl	0	303	October 7, 2021
Are 0/0 genotypes counted as defined in hl.is_defined(mt.GT)? And are NA genotypes counted as missing in hl.is_missing(mt.GT)? Hail Query & hailctl	1	134	April 26, 2024
Hl.variant_qc and the FT field Hail Query & hailctl	4	456	August 16, 2023
How does hail generates the sample call rate? Hail Query & hailctl	2	482	September 23, 2020
Checking if a CallExpression is NA Hail Query & hailctl	10	495	November 16, 2020

Call Rate NAs after GT revising

Related topics