Caught scala.MatchError on field MQ0


#1

Hi All,
I have got the following errors when reading vcf file with import_vcf. Could you please give some suggestions?

[task-result-getter-2] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 992.0 in stage 4.0 (TID 75210, xxxxx.xxxx.xxxx.xxx.xxx, executor 10): is.hail.utils.HailException: xxxxx.vcf.bgz: variant chr1:118938389:C:T,A: INFO field MQ0:
unable to convert [0, 0] (of class java.util.ArrayList) to Int:
caught scala.MatchError: ([0, 0],Int) (of class scala.Tuple2)
offending line: chr1 118938389 rs115277817 C T,A 4580.04 PASS AC=9,1;AF=0.00…
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:10)
at is.hail.utils.package$.fatal(package.scala:27)
at is.hail.utils.TextContext.wrapException(Context.scala:11)
at is.hail.utils.WithContext.map(Context.scala:27)
at is.hail.io.vcf.LoadVCF$$anonfun$15$$anonfun$apply$7.apply(LoadVCF.scala:310)
at is.hail.io.vcf.LoadVCF$$anonfun$15$$anonfun$apply$7.apply(LoadVCF.scala:309)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Kind regards,

Jiaowei


#2

Hi Jiaowei,
This looks like a misformatted VCF - the value “[0, 0]” was found in the INFO field, which is a violation of the VCF spec for integer arrays.


#3

Hi Tpoterba,
Thank you. Is it possible to skip these lines with Hail?

Kind regards,

Jiaowei


#4

Hi Jiaowei,
Unfortunately our support for malformed VCFs is pretty bad – we don’t have a way to skip failing lines, etc. This is a terrible error message, though, and needs to print the offending field. I’ve made an issue to track it (https://github.com/hail-is/hail/issues/2232).

I think the best thing to do right now is to reheader the VCF fixing the header line for the problematic INFO field. Probably the necessary fix is to change the Number of some field from 1 to .

When we’ve fixed the error message, this should be a bit easier to see what’s going on. For now, though, you can grep for 0,0 in the INFO field of the variant printed.


#5

similar discussion has been discussed hereon scala.Match Error: Empty field causes scala.MatchError? Link: https://github.com/databricks/spark-xml/issues/80


#6

Same type of error (unhandled unexpected input cases), but this one is coming from Hail code. It’s not exceptionally difficult to fix, but I think we’ll leave it with intent to fix for 0.2.