Can't locate DBI.pm in @INC (you may need to install the DBI module)

Hi,

I am unable to use VEP using hail built on top of spark cluster on DNAnexus platform.

The error stack trace:

Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /cluster/vep/modules /cluster/vep /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /cluster/vep/Bio/EnsEMBL/DBSQL/DBConnection.pm line 74.
BEGIN failed--compilation aborted at /cluster/vep/Bio/EnsEMBL/DBSQL/DBConnection.pm line 74.
Compilation failed in require at /cluster/vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm line 69.
BEGIN failed--compilation aborted at /cluster/vep/Bio/EnsEMBL/DBSQL/DBAdaptor.pm line 69.
Compilation failed in require at /cluster/vep/Bio/EnsEMBL/Registry.pm line 137.
BEGIN failed--compilation aborted at /cluster/vep/Bio/EnsEMBL/Registry.pm line 137.
Compilation failed in require at /cluster/vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 58.
BEGIN failed--compilation aborted at /cluster/vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 58.
Compilation failed in require at /usr/share/perl/5.30/base.pm line 137.
	...propagated at /usr/share/perl/5.30/base.pm line 159.
BEGIN failed--compilation aborted at /cluster/vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm line 56.
Compilation failed in require at /usr/share/perl/5.30/base.pm line 137.
	...propagated at /usr/share/perl/5.30/base.pm line 159.
BEGIN failed--compilation aborted at /cluster/vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 71.
Compilation failed in require at /cluster/vep/vep line 20.
BEGIN failed--compilation aborted at /cluster/vep/vep line 20.

	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:11)
	at is.hail.utils.package$.fatal(package.scala:78)
	at is.hail.methods.VEP$.waitFor(VEP.scala:72)

However, I am able to use VEP from command line successfully

vep --format vcf --cache --offline -i VEP_input.vcf -o ouput.txt --dir /mnt/project/data_vep/103/ --assembly GRCh38

On searching around in hail posts and StackOverflow, the suggestion was to check for PERL5LIB environment variable and see if it is not correctly set. However, I have tested it and they are as follows:

PERL5LIB=/cluster/vep
PERL_BASE=/cluster
PERL_HOME=/cluster/perl-5.24.1/bin
PATH=/cluster/vep:/cluster/perl-5.24.1/bin:/cluster/phantomjs-2.1.1-linux-x86_64/bin:/usr/lib/jvm/java-8-openjdk-amd64:/usr/lib/jvm/java-8-openjdk-amd64/bin:/cluster/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/cluster:/cluster/hadoop/bin:/cluster/spark/bin:/cluster/dnax/bin:/cluster:/cluster/hadoop/bin:/cluster/spark/bin:/cluster/dnax/bin:/cluster:/cluster/hadoop/bin:/cluster/spark/bin:/cluster/dnax/bin:/cluster/miniconda/bin:/home

I have installed my own perl version of 5.24.1 but VEP wrapper of hail seems to be finding 5.30 version(as you can see in the above stack trace).

PS: I have checked for permissions and all files have required permissions.

Here is the corresponding VEP annotation file

{
  "command": [
    "/cluster/vep/vep",
    "--format", "vcf",
    "--offline",
    "--cache",
    "--fork", "4",
    "--dir", "/mnt/project/data_vep/103/",
    "--dir_cache", "/mnt/project/data_vep/103/",
    "--assembly", "GRCh38",
    "--no_stats",
    "--everything",
    "--minimal",
    "--allele_number",
    "--fasta", "/mnt/project/data_vep/103/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz",
    "--dir_plugins", "/mnt/project/data_vep/103/Plugins.GRCh38",
    "--plugin", "LoFtool",
    "--plugin", "LoF,loftee_path:/mnt/project/data_vep/103/Plugins.GRCh38/loftee,human_ancestor_fa:/mnt/project/data_vep/103/Plugins.GRCh38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/mnt/project/data_vep/103/Plugins.GRCh38/loftee.sql.gz,gerp_bigwig:/mnt/project/data_vep/103/Plugins.GRCh38/gerp_conservation_scores.homo_sapiens.GRCh38.bw",
    "--plugin", "CADD,/mnt/project/data_vep/103/custom/CADD_GRCh38_1_5_whole_genome_SNVs.tsv.gz",
    "--custom", "/mnt/project/data_vep/103/custom/clinvar_20210110.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN",
    "__OUTPUT_FORMAT_FLAG__", "-o", "STDOUT"
  ],
    "env": {"PERL5LIB": "/cluster/vep",
    "PERL_BASE":"/cluster",
    "PERL_HOME":"/cluster/perl-5.24.1/bin"},
    "vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele:String,amr_maf:Float64,clin_sig:Array[String],start:Int32,strand:Int32,end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele:String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele: String,exac_fin_maf: Float64,exac_maf: Float64,exac_nfe_allele: String,exac_nfe_maf:Float64,exac_oth_allele: String,exac_oth_maf: Float64,exac_sas_allele: String,exac_sas_maf: Float64,id:String,minor_allele: String,minor_allele_freq: Float64,phenotype_or_disease: Int32,pubmed: Array[Int32],frequencies: Dict[String,Struct{sas: Float64,afr: Float64,gnomad_nfe: Float64,gnomad: Float64,gnomad_fin: Float64,gnomad_eas: Float64,gnomad_afr: Float64,amr: Float64,gnomad_oth: Float64,ea: Float64,eur: Float64,gnomad_asj: Float64,eas: Float64,gnomad_amr: Float64,gnomad_sas: Float64,aa: Float64}],sas_allele: String,sas_maf: Float64,somatic: Int32}],context: String,end: Int32,id: String,input: String,intergenic_consequences: Array[Struct{allele_num: Int32, consequence_terms: Array[String], impact: String, minimised: Int32, variant_allele: String}],most_severe_consequence: String,motif_feature_consequences: Array[Struct{allele_num: Int32, consequence_terms: Array[String], high_inf_pos: String, impact: String, minimised: Int32, motif_feature_id: String, motif_name: String, motif_pos: Int32, motif_score_change: Float64, strand: Int32, variant_allele: String}],regulatory_feature_consequences: Array[Struct{allele_num: Int32, biotype: String,consequence_terms: Array[String], impact: String, minimised: Int32, regulatory_feature_id: String, variant_allele: String}],seq_region_name: String,start: Int32,strand: Int32,transcript_consequences: Array[Struct{allele_num: Int32, amino_acids: String,appris: String, biotype: String, canonical:Int32, ccds: String, cdna_start: Int32,cdna_end: Int32, cds_end: Int32, cds_start:Int32, codons: String, consequence_terms: Array[String], distance: Int32, domains: Array[Struct{db: String, name: String}], exon: String, gene_id: String, gene_pheno: Int32, gene_symbol: String, gene_symbol_source: String, hgnc_id: String, hgvsc: String, hgvsp: String, hgvs_offset: Int32, impact: String, intron: String, lof: String, lof_flags: String, lof_filter: String, lof_info: String, minimised: Int32, polyphen_prediction: String, polyphen_score: Float64, protein_end: Int32, protein_start: Int32, protein_id: String, sift_prediction: String, sift_score: Float64, strand: Int32, swissprot: String, transcript_id: String, trembl: String, tsl: Int32, uniparc: String, variant_allele: String}],variant_class: String}"
}

PS: I am invoking hail from jupyter notebook in driver node on custom spark cluster(2.4.4) version.

Seems like it’s this: perl - Can't locate DBI.pm - Stack Overflow

If you can’t install stuff yourself on your clusters, you should contact DNANexus support about this.

I have gone through the above stackoverflow link already.
DNAnexus doesn’t support custom spark cluster installation. I am currently trying to install my own version to support my custom libraries(wrapper library on top of hail).

It looks like I have used root user to install my own perl version which might be interfering with system installed perl.