I was trying to use
sample_qc function to compute QC some metrics. The ‘PL’ field of my vcf are all missing, but the ‘GQ’ field are not missing. The error message is:
HailException: PL cannot have missing elements.
My understanding is that the
gq_stats from the
sample_qc function are computed from ‘GQ’ field. But why Hail complains missing elements in PL field? How am I able to compute
gq_stats when my vcf only contains ‘GQ’ but not ‘PL’?
Hey @jialiwang1211 !
I’m sorry you’re having trouble with Hail. The issue is almost certainly not with
sample_qc. Are you using
split_multi_hts? Hail assumes that either:
- the PL field is missing, or
- the PL field is an array of not missing values.
It sounds like you have a PL field that is an array of missing values. You can fix this by running this after you read or import your data:
mt = mt.annotate(PL = hl.null(mt.PL.dtype))
Also, be aware that split_multi_hts cannot recalculate an appropriate GQ if the PL field is missing.
Thank you for your prompt reply!
Yes I was using
split_multi_hts, and yes the PL field in my vcf is an array of missing values.
The problem is fixed by using the code:
mt = hl.variant_qc(hl.split_multi_hts(mt.drop('PL')), name='qc')
Note that I need to drop the PL field in the multi_hts step, otherwise the GQ field becomes NA.