Invalid genotype signature error on LoadVCF

When Loading and aggregate VCF built with Illumina agg software from Strelka called Illumina gVCFs, via:

var_mt = hl.import_vcf(in_files, force_bgz=True)

we get the following error:

HailException: invalid genotype signature: expected signatures to be identical for all inputs.
s3://regap-183760095058-eu-west-2-data/private-preview/subset_10_22/20k_GRCH38_germlinechr20_13754086_18515298.vcf.bgz: struct{GT: call, DP: int32, DPF: int32, AD: array, GQ: int32, PF: array}
s3://regap-183760095058-eu-west-2-data/private-preview/subset_10_22/20k_GRCH38_germlinechr20_61920330_64334162.vcf.bgz: struct{GT: call, DP: int32, DPF: int32, AD: array, GQ: int32, PF: array, PL: array}

Is there a restriction on one, or more, or a combination of attributes that we need to follow.



These are both valid VCFs to Hail if you import them individually, but Hail rejects importing multiple VCFs with the same import_vcfinvocation if their schemas differ. That’s what’s going on here – one of the vcfs has a PL, one doesn’t.

1 Like

I should also note that Hail doesn’t do nicely with gVCFs right now. One of our team members is working on a gVCF import/merging algorithm and sparse genotytpe matrix representation, though, which we expect to be usable and documented in a couple of months.

Excellent. Thanks for the clarification

also note that import_vcf supports a list of files with the same samples and non-overlapping genomic intervals. It doesn’t support importing multiple single-sample [g]VCFs.

Well. We’re running against 20K aggregate gVCF , but look forward to the update :wink: