Hello,
Has there been a major change in terms of how I should be setting up the cluster/running Hail 0.2?
My pipeline that has been working well previously is now broken in multiple places:
Here are some examples, all of which was working as of Friday:
Example #1
expr_tp_ht = hl.import_table(expr_path + tissue + suffix, impute = True).key_by('gene_id')
expr_tp_ht = expr_tp_ht.join(gene_anno_ht.select(gene_anno_ht.gene_id, gene_anno_ht.mappability).key_by('gene_id'), how = 'left')
expr_tp_ht = expr_tp_ht.filter(expr_tp_ht.mappability >= 0.8)
expr_tp_ht = expr_tp_ht.transmute(chrom = hl.int(expr_tp_ht['#chr'][3:]))
expr_tp_ht = expr_tp_ht.transmute(TSS = expr_tp_ht.end).drop('start')
>> expr_tp_ht.write(expr_path + 'hail_tables/' + tissue + '_expression.ht', overwrite = True)
FatalError: ClassCastException: is.hail.codegen.generated.C15 cannot be cast to is.hail.asm4s.AsmFunction5
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 20 times, most recent failure: Lost task 0.19 in stage 22.0 (TID 69, hail-debug-w-1.c.gtex-v8.internal, executor 4): java.lang.ClassCastException: is.hail.codegen.generated.C15 cannot be cast to is.hail.asm4s.AsmFunction5
at is.hail.expr.TableFilter$$anonfun$execute$5.apply(Relational.scala:1505)
at is.hail.expr.TableFilter$$anonfun$execute$5.apply(Relational.scala:1505)
at is.hail.expr.TableValue$$anonfun$40$$anonfun$apply$23.apply(Relational.scala:1228)
at is.hail.expr.TableValue$$anonfun$40$$anonfun$apply$23.apply(Relational.scala:1223)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
...
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at is.hail.io.RichContextRDDRegionValue$.writeRowsPartition(RowStore.scala:861)
at is.hail.io.RichContextRDDRegionValue$$anonfun$5.apply(RowStore.scala:878)
at is.hail.io.RichContextRDDRegionValue$$anonfun$5.apply(RowStore.scala:878)
at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:40)
at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:35)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:132)
at is.hail.utils.richUtils.RichContextRDD.writePartitions(RichContextRDD.scala:44)
at is.hail.io.RichContextRDDRegionValue$.writeRows$extension(RowStore.scala:878)
at is.hail.rvd.RVD$class.write(RVD.scala:385)
at is.hail.rvd.UnpartitionedRVD.write(UnpartitionedRVD.scala:17)
at is.hail.expr.TableValue.write(Relational.scala:1255)
at is.hail.expr.ir.Interpret$.is$hail$expr$ir$Interpret$$interpret(Interpret.scala:397)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:36)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:15)
at is.hail.table.Table.write(Table.scala:934)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)java.lang.ClassCastException: is.hail.codegen.generated.C15 cannot be cast to is.hail.asm4s.AsmFunction5
at is.hail.expr.TableFilter$$anonfun$execute$5.apply(Relational.scala:1505)
at is.hail.expr.TableFilter$$anonfun$execute$5.apply(Relational.scala:1505)
at is.hail.expr.TableValue$$anonfun$40$$anonfun$apply$23.apply(Relational.scala:1228)
at is.hail.expr.TableValue$$anonfun$40$$anonfun$apply$23.apply(Relational.scala:1223)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
...
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at is.hail.io.RichContextRDDRegionValue$.writeRowsPartition(RowStore.scala:861)
at is.hail.io.RichContextRDDRegionValue$$anonfun$5.apply(RowStore.scala:878)
at is.hail.io.RichContextRDDRegionValue$$anonfun$5.apply(RowStore.scala:878)
at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:40)
at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:35)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hail version: devel-802bdce
Error summary: ClassCastException: is.hail.codegen.generated.C15 cannot be cast to is.hail.asm4s.AsmFunction5
Example #2:
>> expr_tp_ht.count()
FatalError: ClassCastException: null
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 20 times, most recent failure: Lost task 0.19 in stage 26.0 (TID 128, hail-debug-w-1.c.gtex-v8.internal, executor 4): java.lang.ClassCastException
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1083)
at is.hail.rvd.RVD$class.count(RVD.scala:336)
at is.hail.rvd.UnpartitionedRVD.count(UnpartitionedRVD.scala:17)
at is.hail.expr.ir.Interpret$$anonfun$is$hail$expr$ir$Interpret$$interpret$1.apply$mcJ$sp(Interpret.scala:390)
at is.hail.expr.ir.Interpret$$anonfun$is$hail$expr$ir$Interpret$$interpret$1.apply(Interpret.scala:390)
at is.hail.expr.ir.Interpret$$anonfun$is$hail$expr$ir$Interpret$$interpret$1.apply(Interpret.scala:390)
at scala.Option.getOrElse(Option.scala:121)
at is.hail.expr.ir.Interpret$.is$hail$expr$ir$Interpret$$interpret(Interpret.scala:390)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:36)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:15)
at is.hail.table.Table.count(Table.scala:299)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)java.lang.ClassCastException: null
at
Hail version: devel-802bdce
Error summary: ClassCastException: null
Example #3:
covs_tp_ht = hl.import_table(fs + wd + 'Data/Covariates/gtex_v8/' + tissue + '.v8.covariates.txt', impute = True)
>> covs_df = covs_tp_ht.to_pandas()
FatalError: ClassCastException: is.hail.codegen.generated.C19 cannot be cast to is.hail.asm4s.AsmFunction5
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 28.0 failed 20 times, most recent failure: Lost task 0.19 in stage 28.0 (TID 150, hail-debug-w-1.c.gtex-v8.internal, executor 4): java.lang.ClassCastException: is.hail.codegen.generated.C19 cannot be cast to is.hail.asm4s.AsmFunction5
at is.hail.expr.TableMapRows$$anonfun$52.apply(Relational.scala:1642)
at is.hail.expr.TableMapRows$$anonfun$52.apply(Relational.scala:1638)
... (similar stack trace to example #1)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hail version: devel-802bdce
Error summary: ClassCastException: is.hail.codegen.generated.C19 cannot be cast to is.hail.asm4s.AsmFunction5
Example #4:
tissue_ds = tissue_ds.annotate_entries(AC = tissue_ds.GT.n_alt_alleles()).drop('GT')
>> tissue_ds.write(genotype_path + 'tissues/v8_WGS_838_phased.SNP_filter.' + tissue + '.mt', overwrite = True)
File "/tmp/978e504087af41e0961a1e46355c8d33/GTEx_v8_eQTL_pipeline_combined.py", line 144, in <module>
tissue_ds.write(genotype_path + 'tissues/v8_WGS_838_phased.SNP_filter.' + tissue + '.mt', overwrite = True)
File "<decorator-gen-556>", line 2, in write
File "/home/hail/hail.zip/hail/typecheck/check.py", line 486, in _typecheck
File "/home/hail/hail.zip/hail/matrixtable.py", line 2027, in write
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/home/hail/hail.zip/hail/utils/java.py", line 196, in deco
hail.utils.java.FatalError: NegativeArraySizeException: null
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 53.0 failed 20 times, most recent failure: Lost task 15.19 in stage 53.0 (TID 384, hail-3-w-2.c.gtex-v8.internal, executor 4): java.lang.NegativeArraySizeException
at java.util.Arrays.copyOf(Arrays.java:3236)
at is.hail.annotations.Region.ensure(Region.scala:139)
at is.hail.annotations.Region.allocate(Region.scala:152)
at is.hail.annotations.Region.allocate(Region.scala:159)
at is.hail.annotations.Region.appendInt(Region.scala:193)
at is.hail.annotations.Region.appendArrayInt(Region.scala:252)
at is.hail.expr.MatrixIR$$anonfun$9$$anonfun$10$$anonfun$apply$8.apply(Relational.scala:282)
at is.hail.expr.MatrixIR$$anonfun$9$$anonfun$10$$anonfun$apply$8.apply(Relational.scala:279)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.next(OrderedRVD.scala:914)
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.next(OrderedRVD.scala:908)
...
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.hasNext(OrderedRVD.scala:911)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:922)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:915)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:915)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:914)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:265)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2.apply(RowStore.scala:914)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2.apply(RowStore.scala:911)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1.apply(RowStore.scala:911)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1.apply(RowStore.scala:910)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:265)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6.apply(RowStore.scala:910)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6.apply(RowStore.scala:904)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:132)
at is.hail.io.RichContextRDDRegionValue$.writeRowsSplit$extension(RowStore.scala:955)
at is.hail.rvd.OrderedRVD.writeRowsSplit(OrderedRVD.scala:449)
at is.hail.expr.MatrixValue.write(Relational.scala:119)
at is.hail.variant.MatrixTable$$anonfun$write$2.apply(MatrixTable.scala:2182)
at is.hail.variant.MatrixTable$$anonfun$write$2.apply(MatrixTable.scala:2182)
at is.hail.expr.ir.Interpret$.is$hail$expr$ir$Interpret$$interpret(Interpret.scala:393)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:36)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:15)
at is.hail.variant.MatrixTable.write(MatrixTable.scala:2182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)java.lang.NegativeArraySizeException: null
at java.util.Arrays.copyOf(Arrays.java:3236)
at is.hail.annotations.Region.ensure(Region.scala:139)
at is.hail.annotations.Region.allocate(Region.scala:152)
at is.hail.annotations.Region.allocate(Region.scala:159)
at is.hail.annotations.Region.appendInt(Region.scala:193)
at is.hail.annotations.Region.appendArrayInt(Region.scala:252)
at is.hail.expr.MatrixIR$$anonfun$9$$anonfun$10$$anonfun$apply$8.apply(Relational.scala:282)
at is.hail.expr.MatrixIR$$anonfun$9$$anonfun$10$$anonfun$apply$8.apply(Relational.scala:279)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.next(OrderedRVD.scala:914)
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.next(OrderedRVD.scala:908)
...
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at is.hail.rvd.OrderedRVD$$anonfun$apply$16$$anon$3.hasNext(OrderedRVD.scala:911)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:922)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3$$anonfun$apply$4.apply(RowStore.scala:915)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:915)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(RowStore.scala:914)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:265)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2.apply(RowStore.scala:914)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1$$anonfun$apply$2.apply(RowStore.scala:911)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1.apply(RowStore.scala:911)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6$$anonfun$apply$1.apply(RowStore.scala:910)
at is.hail.utils.package$.using(package.scala:577)
at is.hail.utils.richUtils.RichHadoopConfiguration$.writeFile$extension(RichHadoopConfiguration.scala:265)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6.apply(RowStore.scala:910)
at is.hail.io.RichContextRDDRegionValue$$anonfun$6.apply(RowStore.scala:904)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$23.apply(ContextRDD.scala:299)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$5$$anonfun$apply$6.apply(ContextRDD.scala:129)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hail version: devel-4f391c7
Error summary: NegativeArraySizeException: null
Job output is complete
It looks like there has been some big change in terms of the Table and MatrixTable data structures? Let me know if this is a bug that’s going to be fixed or I should be setting up the cluster differently.
Thanks,