Contig '1' is not in the reference genome 'GRCh38' error

Hi Hail Team!

I wanted to open my GRCh38 ClinVar VCF file from ncbi onto Hail.

I converted the gz format to bzip by:

gunzip -c file.vcf.gz | bgzip  > file.vcf.bgz

And then ran

mt = hl.import_vcf('GRCh38_latest_clinvar.vcf.bgz', reference_genome = 'GRCh38')
mt.show()

However, this mssg shows up:

Hail version: 0.2.60-de1845e1c2f6
Error summary: HailException: Invalid locus '1:930188' found. Contig '1' is not in the reference genome 'GRCh38'.

What should I do to fix this?

Thank you! :slight_smile:

Chromosome 1 is denoted as chr1 in GRCh38, not 1. You can use the contig_recoding argument on import_vcf to provide a mapping from contig names in the VCF to contig names in the RG:

contig_recoding={'1': 'chr1', ...}
1 Like

ok I see,

Thank you so much!

Best,
Min :slight_smile: