No VEP debug output

I am trying to set up new VEP99 to run with Hail 0.2 and having issues with debugging it because either VEP just fails without any debug output or its output is redirected somewhere and I am not sure where to find it. I am running a luigi pipeline like that:

LUIGI_CONFIG_PATH=luigi_pipeline/configs/GRCh38.cfg nohup python -u submit.py --cpu-limit 4 --num-executors 3 --driver-memory 2g --executor-memory=4g --hail-version 0.2 --run-locally luigi_pipeline/seqr_loading.py SeqrMTToESTask --local-scheduler --spark-home $SPARK_HOME --project-guid batch1 &

And the output of the error that I am getting is completely non-informative:

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 474 in stage 7.0 failed 4 times, most recent failure: Lost task 474.3 in stage 7.0 (TID 3007, 137.187.60.63, executor 1): is.hail.utils.HailException: VEP command '/vep/ensembl-tools-release-95/vep --format vcf --json --everything --allele_number --no_stats --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/95_GRCh38/hg38.fa --plugin LoF,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT' failed with non-zero exit status 2
  VEP Error output:

        at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
        at is.hail.utils.package$.fatal(package.scala:74)
        at is.hail.methods.VEP$.waitFor(VEP.scala:76)
        at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:214)
        at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:157)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at is.hail.io.RichContextRDDRegionValue$$anonfun$boundary$extension$1$$anon$1.hasNext(RichContextRDDRegionValue.scala:185)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:762)
        at scala.collection.Iterator$$anon$16.hasNext(Iterator.scala:598)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        ...
        ...

Until certain point I was able to just separately run the VEP command, check its output, and debug, but right now what I see is that the command is just stuck and has no output, either because the computations are long or because smth is going wrong:

/vep/ensembl-tools-release-95/vep --format vcf --json --everything --allele_number --no_stats --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/95_GRCh38/hg38.fa --plugin LoF,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT

I tried changing --offline to --cache, supplying both of them, but its all the same: program just gets stuck and no output.

I also tried writing a python script that uses subprocess module to submit shell VEP command, submitting this python script to spark-submit, but it seems that it is not submitted to spark (don’t see a task running in chrome browser spark UI), but just hangs without any output again:

import subprocess
import shlex

split_comm = shlex.split(VEP_COMMAND)

process = subprocess.Popen(split_comm,
                     stdout=subprocess.PIPE)
stdout = process.communicate()

while True:
    output = process.stdout.readline()
    if output == '' and process.poll() is not None:
        break
    if output:
        print(output.strip())
rc = process.poll()

Is there a way of either to see VEP error output or just run it separately correctly, so that I would be able to quickly see the output? How would you debug it?

Also asked here:

https://www.biostars.org/p/423083/

If I test VEP it seems to be working just fine:

./vep -i examples/homo_sapiens_GRCh38.vcf --cache --dir_cache /vep/vep_cache

Generating the output files that look right

failed with non-zero exit status 2

This happened to me once - the problem in my case was that the Spark system user didn’t have execution permissions on the files in my VEP folder (normally my bootstrap script takes care of that, still don’t know exactly what happened). I would check that first if you haven’t already ruled it out. Exit code 2 means either a malformed shell command or a permissions problem: http://www.tldp.org/LDP/abs/html/exitcodes.html I don’t think Hail 0.2 handles this case very gracefully and it should ideally capture the system process error message, but I would guess VEP was never run so there was nothing to debug. Possibly there are other things going on, but if the error is Perl-side I think Hail is pretty good about capturing it.

Error code 2 does also mean “badly formed command” - your VEP command looks fine, but I will add that I had problems on AWS Linux 2 with the syntax where VEP is executed as a folder - these problems might be pervasive in centos Linuxes. I don’t remember the specific error message, but I have been running my pipeline like this:

"/vep/variant_effect_predictor/variant_effect_predictor.pl"

instead of the link to /vep/. I could execute /vep in a shell session, but for some reason Spark always complained, and it worked reliably when I passed the script directly in my vep.json.

The way we run vep seems to be dependent on the version of vep. For instance in vep85 there is variant_effect_predictor.pl that we can use for running but in vep95 there is none, and vep is not a folder but an executable file.

Hmm, I see, it’s a Perl script: https://github.com/Ensembl/ensembl-vep/blob/release/99/vep It’s been a while since I set this up from scratch but I seem to remember that the dataproc installation just had /vep as a symlink to variant_effect_predictor.pl

Regardless this still seems like a permissions issue, or there’s some other configuration problem that’s preventing VEP from ever being initialized.

Unfortunately I am still struggling with it. I verified that VEP is working good: I ran it separately successfully. Then, I also was able to run the whole pipeline on a tiny 800Kb vcf sample file. So, what could it be? Permissions to the actual BATCH1.vcf GRCh38 file that I need to run the pipeline on? Or that it is malformed somehow and I was provided with the wrong VCF? I verified that permissions on hadoop of BATCH1.vcf are good. What disturbs me very much is that VEP fails and the only information provided is that the code is 2, so I do not really know what is going wrong with VEP there. That complicates everything immensely. If Hail could redirect this output to any file that would be very useful.

I verified and VEP is working perfectly when called in a standalone mode. However, running it from Hail gives VEP error code 2 somehow. No debug output pretty much and I am not sure where to look further. All permissions on all of the relevant folders and files I set to 777: hadoop files permissions, VEP, loftee.

See the relevant info here: Redirect (or find) VEP (or other) error output from a Hail pipeline