Call Rate NAs after GT revising

I found a problem of my hail QC pipeline,
if I revised GT using the following scripts,

vds = vds.filter_entries((vds.DP > 400)|(vds.DP < 10), keep = False)
vds = hl.variant_qc(vds)

and then calculate the call rate, as below,

callratelist = [hl.agg.filter(vds.Pop == pop, hl.agg.fraction(hl.is_defined(vds.GT))) for pop in custompops]
vds = vds.annotate_rows(
   callrate=hl.Struct(**dict(zip(custompops, callratelist))),
   lowestcallrate = hl.min(hl.array(callratelist))
)

Then all the original missed call genotype will be labeled as NA and will not be count in the call rate calculation. I am wondering if there is any way that we can improve the methods by forcing the call rate calculation by counting both the original missing in original genotype and the revised newly reset missing in the genotype?

Thanks very much.

I think I understand the question.

There are three genotype states:

  1. Called
  2. Missing
  3. Filtered (this applies to an entire entry, not just the genotype).

Call rate is defined as (called) / (called + missing + filtered), which is identical to (called) / (n_samples). It’s not possible to implement this definition using hl.agg.filter as you have, because filtered entries are not aggregated over.

Another way to implement your filter that might give you the results you want is the following:

vds = vds.annotate_entries(GT = hl.or_missing(~((vds.DP > 400)|(vds.DP < 10)), vds.GT)

This line conditionally sets the value of GT to missing when the condition is true.

Dear Tim:
Thank you so much. That is really helpful.