MSG: Could not open index file: No such file or di rectory

Dear all,

I am trying to run VEP in Hail, but I keep running into this exception:

MSG: Could not open index file /path/Homo_sapiens.GRCh38.dna.primary_assembly.fa.index: No such file or directory
STACK Bio::DB::IndexedBase::_open_index /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/DB/IndexedBase.pm:666
STACK Bio::DB::IndexedBase::_index_files /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/DB/IndexedBase.pm:643
STACK Bio::DB::IndexedBase::index_file /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/DB/IndexedBase.pm:484
STACK Bio::DB::IndexedBase::new /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/DB/IndexedBase.pm:364
STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::_get_fasta_db /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bi
o/EnsEMBL/Variation/Utils/FastaSequence.pm:314
STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /path/tools/vep/ensembl-tools-release-85/scripts/variant_effect_predictor/Bio/
EnsEMBL/Variation/Utils/FastaSequence.pm:196
STACK main::configure /path/tools/vep/variant_effect_predictor/variant_effect_predictor.pl:835
STACK toplevel /path/tools/vep/variant_effect_predictor/variant_effect_predictor.pl:146

I hope that this is not a naive question, but I have confirmed that the index file Homo_sapiens.GRCh38.dna.primary_assembly.fa.index is on the correct path, I am just not sure why VEP is not able to detect that. Can I get advice as to how I can troubleshoot this? Thank you very much!

Are you sure it’s on every worker node at that local path? Putting it in HDFS is not sufficient

Thank you for your response Dan. I am not too certain on how to check that, is there a way to initialize that in Hail. My institution just shifted to a new computing cluster, and the exact same VEP code worked previously.

Are you using a Spark cluster or are you just submitting your Hail script as a single HPC job?

This error means that file doesn’t exist at that path wherever Hail/VEP is executing.

Pardon my poor description, but I am starting a Jupyter Notebook session with allocated resources, and initializing normally like this: hl.init(log='./log.log')
I tried it by running interactively on the command line with ipython with similar error

It sounds like you’re on an interactive node of an HPC cluster, is that right?

Are you setting VEP_CONFIG_URI? What is it set to and what is the contents of that file?

Yes, that is right, I am on an interactive node of an HPC cluster. I do not recall a name related to VEP_CONFIG_URI hence I may not have set that.

My main suspicion is that is it due to VEP cache files not being transferred properly with the shift to the new cluster.