Vds to merged sparse matrix

I have been trying to run the following command

vds = hl.vds.read_vds(‘file:///xxx.vds’)
mt = hl.vds.to_merged_sparse_mt(vds, ref_allele_function=None)

This is the error I’m getting, can anyone help me sort this out?

ValueError: to_merged_sparse_mt: in order to construct a ref allele for reference-only sites, either pass a function to fill in reference alleles (e.g. ref_allele_function=lambda locus: hl.missing(‘str’)) or add a sequence file with ‘hl.get_reference(RG_NAME).add_sequence(FASTA_PATH)’.

Thanks in advance

Hi @Rads2512 !

Why do you want a merged sparse MT? This format is a legacy one used for the gnomAD v3 analysis. It’s less efficient than the VDS format and not a dense representation like the output of to_dense_mt.

Hi,

I am trying to replicate the gnomAD code for analysis so I do need the merged_sparse_mt.

Alright, if that’s what you need to do then you need to provide a ref_allele_function, as described in the error message. The Hail VDS does not necessarily have a reference allele for the start of every reference block but the merged sparse MT structure requires such a reference allele. The first suggested option just inserts NA which is fine as long as you don’t need to know the reference allele at the site at the beginning of a reference block. The second suggested option looks up the reference allele in a FASTA file. I recommend looking at the documentation for add_sequence. You can learn the name of the reference genome of your dataset by executing vds.variant_data.locus.dtype.reference_genome or looking at the type information included in vds.variant_data.show(1).

Oh okay. I will try doing that and hopefully, it works.
Thanks