Hi,
Hope to get some insight into the issue I had above and thanks in advanced.
I was trying to import 2 gVCF files named “sample1.wf_snp.gvcf.gz” and “sample2.wf_snp.gvcf.gz” intp hail and converting them into VDS with code below
sample1_gvcf_uri = <s3_location_of_gvcf>
sample2_gvcf_uri = <s3_location_of_gvcf>
gvcfs = [
sample1_gvcf_uri,
sample2_gvcf_uri,
]
combiner = hl.vds.new_combiner(
output_path=vds_uri,
temp_path=f'{vds_prefix}/checkpoints/',
gvcf_paths=gvcfs,
use_genome_default_intervals=True,
reference_genome='GRCh38',
branch_factor=50, # number of inputs combined in one VDS
target_records=500_000 # number of rows per partition
)
And this simple run produces an error which is found below and the example line
...error while parsing line
chr2 64442119 . C CAAAAAAAAAAAA,CAAAAAAAAA,<NON_REF> 8.49 PASS F GT:GQ:DP:AD:AF:PL 1/2:8:21:0,6,5,0:0.2857,0.2381:59,28,4,28,0,4,990,990,990,990
NumberFormatException: For input string: "0.2857,0.2381"
The error probably was refering to the AF INFO field.
I have tried to import this vcf with
mt = hl.import_vcf(sample1_gvcf_uri, reference_genome="GRCh38", force_bgz=True)
mt.describe()
This works with some warning complaining that my gvcf should end with “.vcf[.bgz, .gz]” but it still manages to import that, assuming the mt.describe() outputs the various mt metric it is supposed to and hence by inference the import worked…
Is there any way I can bypass this issue with the new_combiner?