I have been trying to run the following command
vds = hl.vds.read_vds(‘file:///xxx.vds’)
mt = hl.vds.to_merged_sparse_mt(vds, ref_allele_function=None)
This is the error I’m getting, can anyone help me sort this out?
ValueError: to_merged_sparse_mt: in order to construct a ref allele for reference-only sites, either pass a function to fill in reference alleles (e.g. ref_allele_function=lambda locus: hl.missing(‘str’)) or add a sequence file with ‘hl.get_reference(RG_NAME).add_sequence(FASTA_PATH)’.
Thanks in advance
Hi @Rads2512 !
Why do you want a merged sparse MT? This format is a legacy one used for the gnomAD v3 analysis. It’s less efficient than the VDS format and not a dense representation like the output of to_dense_mt
.
Hi,
I am trying to replicate the gnomAD code for analysis so I do need the merged_sparse_mt.
Alright, if that’s what you need to do then you need to provide a ref_allele_function
, as described in the error message. The Hail VDS does not necessarily have a reference allele for the start of every reference block but the merged sparse MT structure requires such a reference allele. The first suggested option just inserts NA
which is fine as long as you don’t need to know the reference allele at the site at the beginning of a reference block. The second suggested option looks up the reference allele in a FASTA file. I recommend looking at the documentation for add_sequence
. You can learn the name of the reference genome of your dataset by executing vds.variant_data.locus.dtype.reference_genome
or looking at the type information included in vds.variant_data.show(1)
.
Oh okay. I will try doing that and hopefully, it works.
Thanks