How to merge two or more sparse MT into one joint-called sparseMT?

Hi Hail team,

Is there any function we can use in Hail to merge two or more sparse MT into one joint-called sparseMT?

At first, we do joint-calling batch by batch. But right now, we want to joint-call all batches together.
Do I need to run run_combiner() again on all gvcfs, or we can merge those sparse MT, which directly generated by run_combiner(), together as one joint-called sparse MT?
Very appreciate your helping!!!


I am also interested in how this would be done. In addition, what is the best approach to append a new set of gVCFs to a sparse matrix. For example, I have a sparse matrix of 3200 samples and I would like to add 800 additional gVCFs that we just received. Based on the gnomAD blog, this type of functionality does seem to be available.

Hi Hail team,

I tried to convert my two batches of joint-called sparseMT to VDS with from_merged_representation method, and then using hail.vds.combiner.new_combiner to combine two VDS together. And export these merged VDS to merged sparseMT using hail.vds.to_merged_sparse_mt.

Is these the way we can merge or joint-call more than one sparseMT? Or Which is the suggestion from you to align the way gnomAD did as they post on their blog? Thanks for answering my questions.

And thank you @jjfarrell for interest in this topic. I also think it should be a way to do it, like the way gnomAD done for joint-call such huge cohort in v3.1.

From: gnomAD v3.1 New Content, Methods, Annotations, and Data Availability | gnomAD news

For gnomAD v3.1, we made good on this promise, adding 4,598 new genomes in gVCF form to the already extant, joint-called gnomAD v3 callset stored in the sparse Hail Matrix Table format.