Thanks so much for supporting such a great tool. I am running into an issue trying to import multiple VCF shards from Sentieon’s implementation of GATK. Some of the shards import without issue. For others, there is an error thrown for single lines (complex indels):
mt = hl.import_vcf(in_path + ‘GVCFtyper-shard_2.vcf’,
reference_genome=‘GRCh38’, array_elements_required=False, skip_invalid_loci=True)
hail.utils.java.FatalError: HailException: GVCFtyper-shard_2.vcf:column 381286: empty integer field
… 0,0,0,0,0,0,0,0,0,0,0,0,0:18:99:.:. :.:.:.:.:. 0/1:8,6,0,0,0,0,0,0,0,0,0 …
offending line: chr1 25819429 . TGAGAGAGA AGAGAGAGA,TGAGAGAGAGAGAGAGAGA,TGAG…
see the Hail log for the full offending line
When I manually remove this line from the VCF, the file can be imported successfully, but this solution is slow and not scalable. Is there a better way to skip such lines and force import? Any insights would be greatly appreciated!