Contig 'MT' is not in the reference genome

Hi all, I am trying to import the ClinVar VCF file from ncbi into Hail, but I am getting this error:

Invalid locus 'MT:235' found. Contig 'MT' is not in the reference genome 'GRCh38'

I downloaded the GRCh38 version of the VCF file and used the corresponding ref genome, can I know how to resolve this? Appreciate any input.

GRCh38 uses chrM for the mitochondrial config (see reference).

You can fix this on import with hl.import_vcf(...,contig_recoding={'MT': 'chrM'}).

Thank you very much. Is there a way to include it with this line?

recode = {f"{i}":f"chr{i}" for i in (list(range(1, 23)) + ['X', 'Y', 'M'])} 

This did not seem to work for me.

Sorry. Above it should have said “contig_recoding” but the autocorrect changed it to “config_recoding”. Your list comprehension wont work because it maps M to chrM. You need to remap MT.

Oh that make sense! However, the VCF file also has the issue where “chr1” is “1”, hence I cannot just do {'MT': 'chrM'}. I am not too familiar with this data type, can I know if there is a way to combine with the above recode?

You can combine dictionaries like this:

contig_recoding={
    “MT”:”chrM”,
    **{f"{i}":f"chr{i}" for i in (list(range(1, 23)) + ['X', 'Y']}
}