I found a problem of my hail QC pipeline,
if I revised GT using the following scripts,
vds = vds.filter_entries((vds.DP > 400)|(vds.DP < 10), keep = False)
vds = hl.variant_qc(vds)
and then calculate the call rate, as below,
callratelist = [hl.agg.filter(vds.Pop == pop, hl.agg.fraction(hl.is_defined(vds.GT))) for pop in custompops]
vds = vds.annotate_rows(
callrate=hl.Struct(**dict(zip(custompops, callratelist))),
lowestcallrate = hl.min(hl.array(callratelist))
)
Then all the original missed call genotype will be labeled as NA and will not be count in the call rate calculation. I am wondering if there is any way that we can improve the methods by forcing the call rate calculation by counting both the original missing in original genotype and the revised newly reset missing in the genotype?
Thanks very much.