Join VDS's with same samples?

Hi everyone! My first Hail post :). Anyway, I’m still a relative newbie so apologies if this is a naive question, but I have two vds’s that came from the same samples but different variants. I was trying to see if there was a way to merge into a single vds but I can’t find an obvious way to do that - the join method on VariantDataSet supports 2 vds’s with overlapping variants but different samples, whereas I’m looking for the opposite. I could try and redesign our pipeline to first join vcf’s (which is a trivial thing if both files have the same samples and non-overlapping variants) and then convert into vds but I was hoping to keep the pre-vds processing to the minimum.

Happy to contribute this functionality to open source if it’s lacking and y’all think it’s valuable
Guillermo

Hi Guillermo,
You’re definitely not the first to ask for this! It’s possible to do this somewhat-efficiently now by going through disk, because the HailContext.read method has the ability to do exactly this, but it’s not currently possible to do this for two VDS python objects.

If you have two vdses, this should work:

# want to union vds1 and vds2
vds1.write('file1.vds')
vds2.write('file2.vds')
union_vds = hc.read(['file1.vds', 'file2.vds'])

I agree this is annoying and terrible and we should just add the union method.

1 Like