I have over 20,000 VCF shards (all have the same samples, were broken by variants).
I try to use hail import these VCFs and write to a big MT. (fileformat=VCFv4.2)
hl.import_vcf(‘gs://path/WGS.*.vcf.gz’, force=True).write(‘gs://path/step1/out.mt’, overwrite=True)
File “”, line 2, in import_vcf
Hail version: 0.2.96-39909e0a396f
Error summary: HailException: fields in ‘call_fields’ must have ‘Number’ equal to 1.
One of my VCF header FORMAT definition:
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="AD">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="GT">
##FORMAT=<ID=MIN_DP,Number=.,Type=Integer,Description="MIN_DP">
##FORMAT=<ID=PGT,Number=.,Type=String,Description="PGT">
##FORMAT=<ID=PID,Number=.,Type=String,Description="PID">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="PL">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=.,Type=Integer,Description="SB">
Now I am confused why Hail complain this: fields in ‘call_fields’ must have ‘Number’ equal to 1. how could I parse certain fields to solve this?
Many thanks in advance, Shuang