Hl.vds.new_combiner gives unexpected TypeError

Hi,

Thank you so much for making hail available to the community!

I am running hail locally on my institution’s cluster.
I have ~500 gVCFs generated in the same way.
I created a VDS with hl.vds.new_combiner, which worked very well and I could extract variants as desired.

I am now trying to add the remaining ~470 with

combiner = hl.vds.new_combiner(
    output_path="/mnt/data/db_merged.vds",
    temp_path="/mnt/data/projects/.tmp",
    gvcf_paths=gvcfs_f,
    vds_paths=["/mnt/data/db.vds"],
    use_genome_default_intervals=True,
    reference_genome="GRCh38"
)

However eventually I get the following error:

Traceback (most recent call last):==========================>(2570 + 16) / 2586]
  File "/mnt/data/projects/add_to_vds_db.py", line 31, in <module>
    combiner.run()
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 356, in run
    self.step()
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 430, in step
    self._step_vdses()
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/variant_dataset_combiner.py", line 484, in _step_vdses
    combined = combine_variant_datasets(vdss)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/combine.py", line 380, in combine_variant_datasets
    reference = combine_references([vds.reference_data for vds in vdss])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/vds/combiner/combine.py", line 374, in combine_references
    ts = hl.Table.multi_way_zip_join([localize(mt) for mt in mts], 'data', 'g')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<decorator-gen-1278>", line 2, in multi_way_zip_join
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/appl/conda/miniconda3/envs/hail/lib/python3.12/site-packages/hail/table.py", line 4634, in multi_way_zip_join
    raise TypeError(
TypeError: All input tables to multi_way_zip_join must have the same row type
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>, LAD: array<int32>}>}
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}
  struct{locus: locus<GRCh38>, __entries: array<struct{END: int32, DP: int32, GQ: int32, ICNT: array<int32>, MIN_DP: int32, SPL: array<int32>}>}

I guess the issue is that the vds also has a LAD field but I’ve not figured out how to remove it before merging the other gVCFs. Could you please help me out? :slight_smile:
Also, I cannot directly overwrite the output, right?

Hi @f-ferraro!

If the vds you already created has an LAD field, but the new gvcfs don’t, I think you can just replace it with one with LAD removed. I would create the modified vds while keeping the old one until you know it has worked.

vds = hl.vds.read_vds("/mnt/data/db.vds")
hl.vds.VariantDataset(
  vds.reference_data.drop('LAD'),
  vds.variant_data,
).write("/mnt/data/db_modified.vds")

You can drop LAD from the variant data too if necessary. Then just run the combiner again using the modified vds.

Flagging @chrisvittal to double check this.

1 Like

This is correct. You need to align the types of the inputs, and write it to a temporary path.