Caught scala.MatchError

I was trying to convert vcf to the vds and after running for some time, I got below error. It seems that this is due to more than one RSID. Is their a way to tackle this error.

2017-09-17 14:25:10 CoarseGrainedSchedulerBackend$DriverEndpoint: INFO: Launching task 415919 on executor id: 2 hostname:
2017-09-17 14:25:10 TaskSetManager: WARN: Lost task 914.0 in stage 4.0 (TID 415833, is.hail.utils.HailException: QGP2935.JC_GATK37.dbSnp_GRCh37.vcf: caught scala.MatchError: ([369037667, 6678753],Int32) (of class scala.Tuple2)
offending line: 1 18366031 rs6678753;rs369037667 G A 129.08 PASS AC=16;AF=0…
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:17)
at is.hail.utils.package$.fatal(package.scala:27)
at is.hail.utils.TextContext.wrapException(Context.scala:15)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.executor.Executor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

Right now our support for malformed VCFs is pretty limited. This is a terrible error message, though – I’ve made an issue to track it (

I don’t think this is an rsID issue, since the type says “Int32” there, which leads me to believe this is an INFO field value which contains 2 values even though the header field says there should be just one. When we fix the error message, it should be much easier to see what’s going on.

As an aside, the fact I’m seeing “Int32” and not “Int” means you’re using the devel version of Hail. I would strongly recommend that you use the 0.1 version, partly because we’re not going to answer devel-specific questions, and there are bugs in the development version that aren’t in 0.1!