How to merge two or more sparse MT into one joint-called sparseMT?

Hi Hail team,

Is there any function we can use in Hail to merge two or more sparse MT into one joint-called sparseMT?

At first, we do joint-calling batch by batch. But right now, we want to joint-call all batches together.
Do I need to run run_combiner() again on all gvcfs, or we can merge those sparse MT, which directly generated by run_combiner(), together as one joint-called sparse MT?
Very appreciate your helping!!!

Best,
Po-Ying

I am also interested in how this would be done. In addition, what is the best approach to append a new set of gVCFs to a sparse matrix. For example, I have a sparse matrix of 3200 samples and I would like to add 800 additional gVCFs that we just received. Based on the gnomAD blog, this type of functionality does seem to be available.

Hi Hail team,

I tried to convert my two batches of joint-called sparseMT to VDS with from_merged_representation method, and then using hail.vds.combiner.new_combiner to combine two VDS together. And export these merged VDS to merged sparseMT using hail.vds.to_merged_sparse_mt.

Is these the way we can merge or joint-call more than one sparseMT? Or Which is the suggestion from you to align the way gnomAD did as they post on their blog? Thanks for answering my questions.

And thank you @jjfarrell for interest in this topic. I also think it should be a way to do it, like the way gnomAD done for joint-call such huge cohort in v3.1.

From: gnomAD v3.1 New Content, Methods, Annotations, and Data Availability | gnomAD browser

For gnomAD v3.1, we made good on this promise, adding 4,598 new genomes in gVCF form to the already extant, joint-called gnomAD v3 callset stored in the sparse Hail Matrix Table format.

Hey @poyingfu & @jjfarrell !

Sorry for the large latency here. We’re moving away from the Sparse MT representation towards the VDS representation. The key change is that VDS stores reference data separately from variant data. This produces substantial improvements to speed but with a somewhat more annoying interface.

If you want to merge your two sparseMT batches, I recommend merging and saving as the new VDS format. Going forward, I recommend using the VDS combiner to go directly from GVCF files to a VDS (don’t use sparseMT at all any more).

We have since added some docs and examples on the VDS combiner page that show you can combine one or more VDSes and/or one or more GVCFs into a new VDS.

Does that help?

1 Like