Doing a linear_mixed_regression_rows, why this error

java.io.FileNotFoundException: File file:/tmp/hail.RhRBka50Vhk6/XzmcqzSdZc/parts/part-4-8-1-0-108d5f77-eb1f-c48c-7881-da0d385cfc49 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
at is.hail.io.fs.HadoopFS.open(HadoopFS.scala:110)
at is.hail.io.fs.HadoopFS.unsafeReader(HadoopFS.scala:441)
at is.hail.HailContext$$anon$4.compute(HailContext.scala:745)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
at is.hail.linalg.BlockMatrixTransposeRDD.compute(BlockMatrix.scala:1562)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:449)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1375)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:452)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)

Runs for few minutes, then error

My guess is that the hail temporary directory (/tmp/ by default) is not network visible

Thought so too first, tried with a nfs shared directory hl.init(sc,tmp_dir=’/nfs/tmp’) and got
java.io.FileNotFoundException: File file:/nfs/tmp/hail.*** does not exist

ah – try file:///nfs/tmp?

Failed with same error

Not going to waste your time. Just if it was something obvious or you had seen before.
I am using spark 3.0-SNAPSHOT and hail both compiled with java 13, running on kubernetes spark operator (in beta), so its not a surprise something does not work. (Btw, skat successfully ran)

Thanks

I’m amazed the system managed to get to this point! Please do keep us updated on how things work with that build.