Get_vcf_metadata omitting FILTER line descriptions

I noticed that get_vcf_metadata doesn’t appear to be pulling the descriptions of FILTER lines. For example, I ran:

metadata = hl.get_vcf_metadata(
    'gs://gnomad-public/release/2.1.1/liftover_grch38/vcf/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz')

The VCF has the following descriptions in the header:

##FILTER=<ID=AC0,Description="Allele count is zero after filtering out low-confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls)">
##FILTER=<ID=InbreedingCoeff,Description="InbreedingCoeff < -0.3">
##FILTER=<ID=PASS,Description="Passed all variant filters">
##FILTER=<ID=RF,Description="Failed random forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels">

but the descriptions from get_vcf_metadata are empty:

{'AC0': {'Description': ''},
 'InbreedingCoeff': {'Description': ''},
 'PASS': {'Description': ''},
 'RF': {'Description': ''}}

huh, weird. We can take a look.

aha…

https://github.com/hail-is/hail/pull/7244 will fix

thank you!!

I am having a similar issue with hail 0.2.70 for the format and and info field. Doesn’t look like it got resolved.

hail-0.2.70-4fcd186e31da

After uploading vcfs, merging to mt, qcing and exporting back to vcf, the header was missing the filled out Description field for the FORMAT and INFO header lines.

Originally this…

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

head header.norm.qc.txt

##hailversion=0.2.70-4fcd186e31da
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="">
##FORMAT=<ID=GT,Number=1,Type=String,Description="">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="">
##FORMAT=<ID=PL,Number=.,Type=Integer,Description="">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="">

Hail does not track these descriptions internally. If you want to include header metadata, you can supply that with the metadata argument of export_vcf.

The expected value here is a dictionary – you can use the desciptions/numbers from some existing VCF by using hl.get_vcf_metadata and passing the result to the export.