Hail doesn’t try to detect reference genomes in VCFs, but you can explicitly specify it as:
hl.import_vcf(..., reference_genome='GRCh38')
I think you’ll also encounter the problem that GRCh38 decided to go with “chr” prefixes but a lot of folks elide those. You can fix those with contig_recoding:
hl.import_vcf(
...,
contig_recoding = {
**{str(i): 'chr' + str(i) for i in range(1, 23)},
**{'X': 'chrX', 'Y': 'chrY', 'MT': 'chrM'}
},
reference_genome='GRCh38'
)
Is this error coming from importing the EVE data? That suggests that the EVE data is keyed by locus & alleles, not gene name, which means my suggestion above won’t work as expected. Can you share eve.describe() and mt.describe()?