I have some data in a matrix table stored in s3 in the US-west region. I’d like to merge this data with the [ 1000 Genomes HighCov autosomes data. Anticipating (correctly) that this would not be a straightforward, one-time thing, I made a US-west copy of all the objects that make up the matrix table in s3://hail-datasets-us-east-1/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt
.
My attempt to merge these datasets looks more or less like this (I’m using version 0.2.72-cfce5e858cab)
my_mt = hl.read_matrix_table('s3a://my_bucket/my_cohort.mt')
tgp_mt = hl.read_matrix_table('s3a://my_bycket/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt')
tgp_mt = tgp_mt.select_cols()
my_tgp_ekeys= my_mt.entry.keys() & tgp_mt.entry.keys()
my_mt = my_mt.select_entries(*my_tgp_ekeys)
tgp_mt = tgp_mt.select_entries(*my_tgp_ekeys)
my_tgp_mt = tgp_mt.union_cols(my_mt)
my_tgp_mt = hl.sample_qc(my_tgp_mt,name="sample_qc")
my_tgp_mt.write('s3a://my_bucket/my_tgp_cohort.mt')
What ends up happening is I will get FileNotFoundException:
No such file or directory: s3a://my_bucket/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt/entries/rows/parts/part-00311-7-311-0-ae371ed3-1c91-eca9-9251-acc1b0de3620
If I do a aws s3 ls --no-sign-request s3://hail-datasets-us-east-1/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt/entries/rows/parts/
I can confirm that part-00311-7-311-0-ae371ed3-1c91-eca9-9251-acc1b0de3620
is not there. When I compare the metadata.json.gz
's _partFile
entries I do find the missing part. It appears that there are both s3 objects not listed in the metadata.json.gz as well as objects in the metadata.json.gz file that do not appear in s3.
I guess my question is: what’s going on here? Is this the right way to merge two cohorts from two matrix tables? Are there actually parts missing from the GRCh38 30x 1000 genomes matrix table?
The full backtrace is :
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 947 in stage 3.0 failed 10 times, most recent failure: Lost task 947.9 in stage 3.0 (TID 2149) (172.18.6.11 executor 27): java.io.FileNotFoundException: No such file or directory: s3a://my_bucket/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt/entries/rows/parts/part-00311-7-311-0-ae371ed3-1c91-eca9-9251-acc1b0de3620
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2269)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:702)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
at is.hail.io.fs.HadoopFS.openNoCompression(HadoopFS.scala:83)
at is.hail.io.fs.FS.open(FS.scala:139)
at is.hail.io.fs.FS.open$(FS.scala:138)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.io.fs.FS.open(FS.scala:151)
at is.hail.io.fs.FS.open$(FS.scala:150)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.io.fs.FS.open(FS.scala:148)
at is.hail.io.fs.FS.open$(FS.scala:147)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.HailContext$.$anonfun$readRowsSplit$5(HailContext.scala:383)
at is.hail.sparkextras.IndexReadRDD.compute(IndexReadRDD.scala:25)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at is.hail.sparkextras.ContextRDD.iterator(ContextRDD.scala:390)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.$anonfun$parentIterator$1(RepartitionedOrderedRDD2.scala:66)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.dropLeft(RepartitionedOrderedRDD2.scala:76)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.<init>(RepartitionedOrderedRDD2.scala:73)
at is.hail.sparkextras.RepartitionedOrderedRDD2.$anonfun$compute$1(RepartitionedOrderedRDD2.scala:62)
at is.hail.io.RichContextRDDLong$.$anonfun$boundary$4(RichContextRDDRegionValue.scala:188)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at is.hail.io.RichContextRDDLong$$anon$3.hasNext(RichContextRDDRegionValue.scala:197)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:1087)
at is.hail.utils.richUtils.RichIterator$$anon$1.isValid(RichIterator.scala:30)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:48)
at is.hail.utils.FlipbookIterator$$anon$6.calculateValidity(FlipbookIterator.scala:221)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.refreshValidity(FlipbookIterator.scala:210)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.refreshValidity$(FlipbookIterator.scala:209)
at is.hail.utils.FlipbookIterator$$anon$6.refreshValidity(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.$init$(FlipbookIterator.scala:214)
at is.hail.utils.FlipbookIterator$$anon$6.<init>(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator.staircased(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator.cogroup(FlipbookIterator.scala:254)
at is.hail.utils.FlipbookIterator.innerJoin(FlipbookIterator.scala:360)
at is.hail.annotations.OrderedRVIterator.innerJoin(OrderedRVIterator.scala:116)
at is.hail.rvd.KeyedRVD.$anonfun$orderedJoin$1(KeyedRVD.scala:66)
at is.hail.rvd.KeyedRVD.$anonfun$orderedJoin$5(KeyedRVD.scala:86)
at is.hail.sparkextras.ContextRDD.$anonfun$czipPartitions$2(ContextRDD.scala:316)
at is.hail.sparkextras.ContextRDD.$anonfun$cmapPartitions$3(ContextRDD.scala:218)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at is.hail.utils.richUtils.RichContextRDD$$anon$1.hasNext(RichContextRDD.scala:71)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2291)
at is.hail.rvd.RVD.combine(RVD.scala:725)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:913)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:56)
at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
at is.hail.expr.ir.InterpretNonCompilable$.rewrite$1(InterpretNonCompilable.scala:53)
at is.hail.expr.ir.InterpretNonCompilable$.rewrite$1(InterpretNonCompilable.scala:39)
at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:15)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:15)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:12)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:14)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:12)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:29)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
at is.hail.backend.spark.SparkBackend.$anonfun$execute$1(SparkBackend.scala:365)
at is.hail.expr.ir.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:627)
at is.hail.expr.ir.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:627)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:46)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:362)
at is.hail.backend.spark.SparkBackend.$anonfun$executeJSON$1(SparkBackend.scala:406)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:404)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
java.io.FileNotFoundException: No such file or directory: s3a://my_bucket/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt/entries/rows/parts/part-00311-7-311-0-ae371ed3-1c91-eca9-9251-acc1b0de3620
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2269)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:702)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
at is.hail.io.fs.HadoopFS.openNoCompression(HadoopFS.scala:83)
at is.hail.io.fs.FS.open(FS.scala:139)
at is.hail.io.fs.FS.open$(FS.scala:138)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.io.fs.FS.open(FS.scala:151)
at is.hail.io.fs.FS.open$(FS.scala:150)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.io.fs.FS.open(FS.scala:148)
at is.hail.io.fs.FS.open$(FS.scala:147)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:70)
at is.hail.HailContext$.$anonfun$readRowsSplit$5(HailContext.scala:383)
at is.hail.sparkextras.IndexReadRDD.compute(IndexReadRDD.scala:25)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at is.hail.sparkextras.ContextRDD.iterator(ContextRDD.scala:390)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.$anonfun$parentIterator$1(RepartitionedOrderedRDD2.scala:66)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.dropLeft(RepartitionedOrderedRDD2.scala:76)
at is.hail.sparkextras.RepartitionedOrderedRDD2$$anon$1.<init>(RepartitionedOrderedRDD2.scala:73)
at is.hail.sparkextras.RepartitionedOrderedRDD2.$anonfun$compute$1(RepartitionedOrderedRDD2.scala:62)
at is.hail.io.RichContextRDDLong$.$anonfun$boundary$4(RichContextRDDRegionValue.scala:188)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at is.hail.io.RichContextRDDLong$$anon$3.hasNext(RichContextRDDRegionValue.scala:197)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:1087)
at is.hail.utils.richUtils.RichIterator$$anon$1.isValid(RichIterator.scala:30)
at is.hail.utils.StagingIterator.isValid(FlipbookIterator.scala:48)
at is.hail.utils.FlipbookIterator$$anon$6.calculateValidity(FlipbookIterator.scala:221)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.refreshValidity(FlipbookIterator.scala:210)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.refreshValidity$(FlipbookIterator.scala:209)
at is.hail.utils.FlipbookIterator$$anon$6.refreshValidity(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator$ValidityCachingStateMachine.$init$(FlipbookIterator.scala:214)
at is.hail.utils.FlipbookIterator$$anon$6.<init>(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator.staircased(FlipbookIterator.scala:219)
at is.hail.utils.FlipbookIterator.cogroup(FlipbookIterator.scala:254)
at is.hail.utils.FlipbookIterator.innerJoin(FlipbookIterator.scala:360)
at is.hail.annotations.OrderedRVIterator.innerJoin(OrderedRVIterator.scala:116)
at is.hail.rvd.KeyedRVD.$anonfun$orderedJoin$1(KeyedRVD.scala:66)
at is.hail.rvd.KeyedRVD.$anonfun$orderedJoin$5(KeyedRVD.scala:86)
at is.hail.sparkextras.ContextRDD.$anonfun$czipPartitions$2(ContextRDD.scala:316)
at is.hail.sparkextras.ContextRDD.$anonfun$cmapPartitions$3(ContextRDD.scala:218)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at is.hail.utils.richUtils.RichContextRDD$$anon$1.hasNext(RichContextRDD.scala:71)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hail version: 0.2.72-cfce5e858cab
Error summary: FileNotFoundException: No such file or directory: s3a://my_bucket/1000_Genomes/NYGC_30x/GRCh38/autosomes_unphased.mt/entries/rows/parts/part-00311-7-311-0-ae371ed3-1c91-eca9-9251-acc1b0de3620