Error running miniconda installed VEP

Encountered error when running VEP. VEP was installed using miniconda in all nodes.Tested ‘/home/
hadoop/miniconda2/bin/vep’ works on the slave nodes.Below is the error message.

From org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0
failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 9, ip-172-31-7-21.ap-
southeast-1.compute.internal, executor 1): is.hail.utils.HailException: VEP command ‘/home/
hadoop/miniconda2/bin/vep --format vcf --json --everything --allele_number --no_stats --cac
he --offline --minimal --assembly GRCh38 --plugin LoF,human_ancestor_fa:/home/hadoop/vep/lo
ftee_data/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/h
ome/hadoop/vep/loftee_data/phylocsf_gerp.sql,gerp_file:/home/hadoop/vep/loftee_data/GERP_sc
ores.final.sorted.txt.gz -o STDOUT’ failed with non-zero exit status 2

Do you have the executor logs? There should be more information printed there that details what went wrong.

2019-02-26 09:06:36 TaskSetManager: WARN: Lost task 1.0 in stage 2.0 (TID 4, ip-172-31-7-21.ap-southeast-1.compute.internal, executor 1): is.hail.utils.HailException: VEP command ‘/home/hadoop/miniconda2/bin/vep --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --assembly GRCh38 --plugin LoF,human_ancestor_fa:/home/hadoop/vep/loftee_data/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/home/hadoop/vep/loftee_data/phylocsf_gerp.sql,gerp_file:/home/hadoop/vep/loftee_data/GERP_scores.final.sorted.txt.gz -o STDOUT’ failed with non-zero exit status 2
VEP Error output:

at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:26)
at is.hail.methods.VEP$.waitFor(VEP.scala:75)
at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:212)
at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:155)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1094)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1020)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:811)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at is.hail.sparkextras.ContextRDD.iterator(ContextRDD.scala:599)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anonfun$compute$1$$anonfun$apply$1.apply(RepartitionedOrderedRDD2.scala:60)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anonfun$compute$1$$anonfun$apply$1.apply(RepartitionedOrderedRDD2.scala:59)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:764)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.io.RichContextRDDRegionValue$$anonfun$boundary$extension$1$$anon$1.hasNext(RowStore.scala:1604)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1004)
at is.hail.utils.richUtils.RichIterator$$anon$5.isValid(RichIterator.scala:22)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:48)
at is.hail.utils.FlipbookIterator$$anon$9.setValue(FlipbookIterator.scala:331)
at is.hail.utils.FlipbookIterator$$anon$9.<init>(FlipbookIterator.scala:344)
at is.hail.utils.FlipbookIterator.leftJoinDistinct(FlipbookIterator.scala:323)
at is.hail.annotations.OrderedRVIterator.leftJoinDistinct(OrderedRVIterator.scala:62)
at is.hail.rvd.KeyedRVD$$anonfun$6.apply(KeyedRVD.scala:88)
at is.hail.rvd.KeyedRVD$$anonfun$6.apply(KeyedRVD.scala:88)
at is.hail.rvd.KeyedRVD$$anonfun$orderedJoinDistinct$1.apply(KeyedRVD.scala:98)
at is.hail.rvd.KeyedRVD$$anonfun$orderedJoinDistinct$1.apply(KeyedRVD.scala:95)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$36.apply(ContextRDD.scala:469)
at is.hail.sparkextras.ContextRDD$$anonfun$czipPartitions$1$$anonfun$apply$36.apply(ContextRDD.scala:469)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32$$anonfun$apply$33.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32$$anonfun$apply$33.apply(ContextRDD.scala:422)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:1790)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:1784)
at is.hail.utils.package$.using(package.scala:587)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:1784)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:1782)
at is.hail.utils.package$.using(package.scala:587)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:296)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2.apply(RowStore.scala:1782)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26$$anonfun$apply$2.apply(RowStore.scala:1780)
at is.hail.utils.package$.using(package.scala:587)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26.apply(RowStore.scala:1780)
at is.hail.io.RichContextRDDRegionValue$$anonfun$26.apply(RowStore.scala:1778)
at is.hail.utils.package$.using(package.scala:587)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:296)
at is.hail.io.RichContextRDDRegionValue$.is$hail$io$RichContextRDDRegionValue$$writeSplitRegion$extension(RowStore.scala:1778)
at is.hail.io.RichContextRDDRegionValue$$anonfun$24.apply(RowStore.scala:1731)
at is.hail.io.RichContextRDDRegionValue$$anonfun$24.apply(RowStore.scala:1719)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Hi, Any updates ? Thanks in advance.

Sorry for the delay.

Do the logs contain any other suggestive messages? I would have expected some more information about what caused VEP to fail.

Without other information, my next suggestion is to execute

/home/hadoop/miniconda2/bin/vep \
 --format vcf --json --everything --allele_number --no_stats --cache \
 --offline --minimal --assembly GRCh38 \
 --plugin LoF,human_ancestor_fa:/home/hadoop/vep/loftee_data/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/home/hadoop/vep/loftee_data/phylocsf_gerp.sql,gerp_file:/home/hadoop/vep/loftee_data/GERP_scores.final.sorted.txt.gz \
 -o STDOUT

With a small VCF on a worker node (perhaps ip-172-31-7-21.ap-southeast-1.compute.internal, if you still have access to that one). The error output will hopefully point to the root cause of this issue.