Hi all,
I’ve gotten VEP annotations working well with Hail, and now we want to use it to consistently annotate all of our variant datasets. However, I’m running into errors trying to annotate the public ClinVar VCF file (version 20171029). I’m able to import to VDS and run split_multi
, variant_qc
, and deduplicate
with no problems, and some chromosomes are annotated just fine:
vds_annot = (vds
.filter_variants_expr('v.contig == "21"')
.vep('vep.properties')
)
2017-12-01 18:40:42 Hail: INFO: vep: annotated 22284 variants
But when running other chromosomes, somewhere there is a variant causing problems:
FatalError: NumberFormatException: For input string: "0.09848,-:0.01515,-:0"
I’m not sure how to debug this, since no part of this string can be found by grep in the original VCF, and the original variant is nowhere in the error message.
Any advice on what could be going wrong, or even how to begin to debug this?
Thanks!
Jake
FatalError Traceback (most recent call last)
in ()
----> 1 vds_annot = vds.filter_variants_expr(‘v.contig == “19”’).vep(‘…/vdstools/vep.properties’)in vep(self, config, block_size, root, csq)
/home/hadoop/hail/python/hail/java.pyc in handle_py4j(func, *args, **kwargs)
119 raise FatalError(‘%s\n\nJava stack trace:\n%s\n’
120 ‘Hail version: %s\n’
→ 121 ‘Error summary: %s’ % (deepest, full, Env.hc().version, deepest))
122 except py4j.protocol.Py4JError as e:
123 if e.args[0].startswith(‘An error occurred while calling’):FatalError: NumberFormatException: For input string: “0.09848,-:0.01515,-:0”
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 268 in stage 18.0 failed 4 times, most recent failure: Lost task 268.3 in stage 18.0 (TID 2138, ip-10-10-112-170.ec2.internal, executor 17): java.lang.NumberFormatException: For input string: “0.09848,-:0.01515,-:0”
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:324)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:363)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:360)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:360)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$16.apply(AnnotationImpex.scala:385)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$16.apply(AnnotationImpex.scala:385)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:385)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:363)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:360)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:360)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:302)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2$$anonfun$14.apply(VEP.scala:353)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2$$anonfun$14.apply(VEP.scala:322)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2.apply(VEP.scala:377)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2.apply(VEP.scala:310)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at is.hail.methods.VEP$.annotate(VEP.scala:389)
at is.hail.variant.VariantSampleMatrix.vep(VariantSampleMatrix.scala:2057)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)java.lang.NumberFormatException: For input string: “0.09848,-:0.01515,-:0”
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:324)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:363)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:360)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:360)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$16.apply(AnnotationImpex.scala:385)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$16.apply(AnnotationImpex.scala:385)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:385)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:363)
at is.hail.expr.JSONAnnotationImpex$$anonfun$importAnnotation$15.apply(AnnotationImpex.scala:360)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:360)
at is.hail.expr.JSONAnnotationImpex$.importAnnotation(AnnotationImpex.scala:302)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2$$anonfun$14.apply(VEP.scala:353)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2$$anonfun$14.apply(VEP.scala:322)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2.apply(VEP.scala:377)
at is.hail.methods.VEP$$anonfun$6$$anonfun$apply$2.apply(VEP.scala:310)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Hail version: 0.1-e8d0f38
Error summary: NumberFormatException: For input string: “0.09848,-:0.01515,-:0”