Issue with GATK after using Hail

I subsetted a VCF by using the standard read/filtersamples/write and used exportvcf afterward. I sent the file to someone who tried to use GATK to remove a few more samples, and got the following error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.IllegalStateException: Key  found in VariantContext field INFO at 1:664468 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
	at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:177)
	at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:115)
	at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222)
	at org.broadinstitute.gatk.engine.io.storage.VariantContextWriterStorage.add(VariantContextWriterStorage.java:200)
	at org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub.add(VariantContextWriterStub.java:272)
	at org.broadinstitute.gatk.tools.walkers.variantutils.SelectVariants.map(SelectVariants.java:851)
	at org.broadinstitute.gatk.tools.walkers.variantutils.SelectVariants.map(SelectVariants.java:309)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
	at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
	at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
	at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
	at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
	at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
	at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

with the key point being
java.lang.IllegalStateException: Key found in VariantContext field INFO at 1:664468 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

I assume this has something to do with how exportvcf worked. Let me know if I can give more info to resolve this, or if I’m wrong and it isn’t a Hail issue.

Judging from the extra space in “Key found”, I’m guessing we somehow output an empty string info field key. I’d like to see the info fields before and after. Can you grep the chromosome 1, position 664468 lines in both? (if you share here, remove the genotypes and obscure other sensitive info)

It’s a .vcf.bgz. Is there an easy way to un-bgz it?

zgrep will work.

Or gunzip -c vcf.bgz | grep ...

Yet another alternative: zcat file | grep ...