Hi,
Thank you so much for making hail available to the community!
I am running hail locally on my institution’s cluster.
I have ~500 gVCFs generated in the same way.
I created a VDS with hl.vds.new_combiner
, which worked very well and I could extract variants as desired.
I am now trying to add the remaining ~470 with
combiner = hl.vds.new_combiner(
output_path="/mnt/data/db_merged.vds",
temp_path="/mnt/data/projects/.tmp",
gvcf_paths=gvcfs_f,
vds_paths=["/mnt/data/db.vds"],
use_genome_default_intervals=True,
reference_genome="GRCh38"
)
However eventually I get the following error:
Traceback (most recent call last):==========================>(2570 + 16) / 2586]
File "/mnt/data/projects/add_to_vds_db.py", line 31, in <module>
combiner.run()
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 356, in run
self.step()
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 430, in step
self._step_vdses()
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 484, in _step_vdses
combined = combine_variant_datasets(vdss)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/combine.py", line 380, in combine_variant_datasets
reference = combine_references([vds.reference_data for vds in vdss])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/combine.py", line 374, in combine_references
ts = hl.Table.multi_way_zip_join([localize(mt) for mt in mts], 'data', 'g')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<decorator-gen-1278>", line 2, in multi_way_zip_join
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/table.py", line 4634, in multi_way_zip_join
raise TypeError(
TypeError: All input tables to multi_way_zip_join must have the same row type
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>, LAD: array<int32>}>}
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
I guess the issue is that the vds also has a LAD field but I’ve not figured out how to remove it before merging the other gVCFs. Could you please help me out?
Also, I cannot directly overwrite the output, right?