Assertion failed

Hi, everyone.

I’m trying to follow GWAS tutorial and everything works perfectly using tutorial data.
But when I run using production data, for some chromosomes, after filter_rows for QC purposes, I cannot work with my MatrixTable anymore.

Filter proccess:

mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.01)
mt = mt.filter_rows(mt.variant_qc.p_value_hwe > 1e-6)

After this process, calling MatrixTable’s count method, I’ve got this error message:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 442 in stage 2.0 failed 1 times, most recent failure: Lost task 442.0 in stage 2.0 (TID 570, localhost, executor driver): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)

Hail version: 0.2.16-6da0d3571629
Error summary: AssertionError: assertion failed

Can anyone help me?

Thanks in advance.

What’s the full stack trace? Getting an assertion error pretty much means we have a bug, and should either fix something wrong or throw a meaningful error message.

Hi, @tpoterba.
Thanks for the quickly response.
This is the full stacktrace:

Executor: ERROR: Exception in task 442.0 in stage 101.0 (TID 24071)
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at is.hail.io.AbstractBinaryReader.readBytes(AbstractBinaryReader.scala:17)
at is.hail.io.AbstractBinaryReader.readBytes(AbstractBinaryReader.scala:26)
at is.hail.codegen.generated.C355.apply(Unknown Source)
at is.hail.codegen.generated.C355.apply(Unknown Source)
at is.hail.io.bgen.BgenRecordIteratorWithoutFilter.next(BgenRDD.scala:161)
at is.hail.io.bgen.BgenRecordIteratorWithoutFilter.next(BgenRDD.scala:145)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:464)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at is.hail.io.RichContextRDDRegionValue$$anonfun$boundary$extension$1$$anon$1.hasNext(RowStore.scala:1981)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1002)
at is.hail.utils.richUtils.RichIterator$$anon$5.isValid(RichIterator.scala:22)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:48)
at is.hail.utils.FlipbookIterator$$anon$9.setValue(FlipbookIterator.scala:327)
at is.hail.utils.FlipbookIterator$$anon$9.advance(FlipbookIterator.scala:341)
at is.hail.utils.StagingIterator.advance(FlipbookIterator.scala:53)
at is.hail.utils.FlipbookIterator$$anon$5.advance(FlipbookIterator.scala:179)
at is.hail.utils.StagingIterator.stage(FlipbookIterator.scala:61)
at is.hail.utils.StagingIterator.hasNext(FlipbookIterator.scala:71)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at is.hail.rvd.RVD$$anonfun$count$2.apply(RVD.scala:655)
at is.hail.rvd.RVD$$anonfun$count$2.apply(RVD.scala:653)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitions$1$$anonfun$apply$28.apply(ContextRDD.scala:405)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitions$1$$anonfun$apply$28.apply(ContextRDD.scala:405)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.fold(TraversableOnce.scala:212)
at scala.collection.AbstractIterator.fold(Iterator.scala:1334)
at org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1096)
at org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1096)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-07-03 09:55:56 TaskSetManager: WARN: Lost task 442.0 in stage 101.0 (TID 24071, localhost, executor driver): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:156)
at is.hail.io.AbstractBinaryReader.readBytes(AbstractBinaryReader.scala:17)
at is.hail.io.AbstractBinaryReader.readBytes(AbstractBinaryReader.scala:26)
at is.hail.codegen.generated.C355.apply(Unknown Source)
at is.hail.codegen.generated.C355.apply(Unknown Source)
at is.hail.io.bgen.BgenRecordIteratorWithoutFilter.next(BgenRDD.scala:161)
at is.hail.io.bgen.BgenRecordIteratorWithoutFilter.next(BgenRDD.scala:145)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:464)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at is.hail.io.RichContextRDDRegionValue$$anonfun$boundary$extension$1$$anon$1.hasNext(RowStore.scala:1981)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:1002)
at is.hail.utils.richUtils.RichIterator$$anon$5.isValid(RichIterator.scala:22)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:48)
at is.hail.utils.FlipbookIterator$$anon$9.setValue(FlipbookIterator.scala:327)
at is.hail.utils.FlipbookIterator$$anon$9.advance(FlipbookIterator.scala:341)
at is.hail.utils.StagingIterator.advance(FlipbookIterator.scala:53)
at is.hail.utils.FlipbookIterator$$anon$5.advance(FlipbookIterator.scala:179)
at is.hail.utils.StagingIterator.stage(FlipbookIterator.scala:61)
at is.hail.utils.StagingIterator.hasNext(FlipbookIterator.scala:71)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at is.hail.rvd.RVD$$anonfun$count$2.apply(RVD.scala:655)
at is.hail.rvd.RVD$$anonfun$count$2.apply(RVD.scala:653)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitions$1$$anonfun$apply$28.apply(ContextRDD.scala:405)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitions$1$$anonfun$apply$28.apply(ContextRDD.scala:405)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.fold(TraversableOnce.scala:212)
at scala.collection.AbstractIterator.fold(Iterator.scala:1334)
at org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1096)
at org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1096)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

What file system are you running against? This is a weird assertion to hit.

The data is in bgen format.
I’m running the analysis in Linux OS (Ubuntu 14.04) - FS ext4

I suspect your BGEN file violates one of our assumptions. We only support UKBB-style BGEN files (primarily because there aren’t local users who have asked for other formats). See the note in import_bgen:

Hail supports importing data from v1.2 of the BGEN file format. Genotypes must be unphased and diploid , genotype probabilities must be stored with 8 bits, and genotype probability blocks must be compressed with zlib or uncompressed. All variants must be bi-allelic

I suppose we should check if the BGEN is 8 bit or not and raise a sensible exception if it is not.

I think we do…

Here’s the assertion hit:

  def readBytes(byteArray: Array[Byte], offset: Int, length: Int): Int = {
    var hasRead = 0
    var toRead = length
    while (toRead > 0) {
      val result = read(byteArray, hasRead, toRead)
*     assert(result >= 0)
      hasRead += result
      toRead -= result
    }
    hasRead
  }

The only explanation I have for that is a premature EOF. Maybe it’s an unexpected compression scheme?

Can the BGEN be read using other tools?