Cannot run 'sample_qc' function when PL field is missing

Hi developer,

I was trying to use sample_qc function to compute QC some metrics. The ‘PL’ field of my vcf are all missing, but the ‘GQ’ field are not missing. The error message is: HailException: PL cannot have missing elements.

My understanding is that the gq_stats from the sample_qc function are computed from ‘GQ’ field. But why Hail complains missing elements in PL field? How am I able to compute gq_stats when my vcf only contains ‘GQ’ but not ‘PL’?

Thanks!

Hey @jialiwang1211 !

I’m sorry you’re having trouble with Hail. The issue is almost certainly not with sample_qc. Are you using split_multi_hts? Hail assumes that either:

  • the PL field is missing, or
  • the PL field is an array of not missing values.

It sounds like you have a PL field that is an array of missing values. You can fix this by running this after you read or import your data:

mt = mt.annotate(PL = hl.null(mt.PL.dtype))

Also, be aware that split_multi_hts cannot recalculate an appropriate GQ if the PL field is missing.

Hi @danking,

Thank you for your prompt reply!
Yes I was using split_multi_hts, and yes the PL field in my vcf is an array of missing values.

The problem is fixed by using the code:

mt = hl.variant_qc(hl.split_multi_hts(mt.drop('PL')), name='qc')

Note that I need to drop the PL field in the multi_hts step, otherwise the GQ field becomes NA.

Thanks!