Hello,
I am trying to run VariantDatasetCombiner:
path_to_input_list = ‘input_files.txt’ # a file with one GVCF path per line
gvcfs =
with hl.hadoop_open(path_to_input_list, ‘r’) as f:
for line in f:
print(line)
gvcfs.append(line.strip())
combiner = hl.vds.new_combiner(
output_path=‘dataset.vds’,
temp_path=‘my-temp-bucket’,
gvcf_paths=gvcfs,
use_genome_default_intervals=True,
reference_genome=‘GRCh38’,
)
combiner.run()
But an error showed up:
Current key: { locus: { contig: chrM, position: 302 }, alleles: [3; A, AC, ACC] }
Previous key: { locus: { contig: chrM, position: 302 }, alleles: [3; A, AC, C] }
This error can occur after a split_multi if the dataset
contains both multiallelic variants and duplicated loci.
How can I tell the combiner not to split_multi or how should I preprocess the gvfc files?
In addition I would like to mention that using hl.experimental.run_combiner was working with the same files:
import hail as hl
path_to_input_list = ‘input_files.txt’ # a file with one GVCF path per line
inputs =
with hl.hadoop_open(path_to_input_list, ‘r’) as f:
for line in f:
print(line)
inputs.append(line.strip())
output_file = ‘output.mt’ # output destination
temp_bucket = ‘my-temp-bucket’ # bucket for storing intermediate files
hl.experimental.run_combiner(inputs, out_file=output_file, tmp_path=temp_bucket, reference_genome=‘GRCh38’, use_genome_default_intervals=True, overwrite=True)
But I need to use methods that VariantDataset includes in my analysis.
I would appreciate your help.