Hail cluster mode output file error

Hello Everyone.

I try to set hail on a spark cluster using 2 workstations. (File system HDFS)

Workstation 1 : Spark (master node and worker node), HDFS (NameNode and datanode)
Workstation 2 : Spark (worker node), HDFS (datanode)

Spark version : 3.1.2
Hadoop version : 3.3.0

It worked well when it was just a master node.

First test : just a master node.
-code-

~/hadoop-3.3.0/sbin/start-dfs.sh
jps
92803 SecondaryNameNode
92293 NameNode
96950 Jps
92486 DataNode
sudo ~/spark-3.1.2/sbin/start-master.sh
sudo ~/spark-3.1.2/sbin/start-worker.sh spark://genome101:7077
sudo python3
import hail as hl
hl.init(master=‘spark://genome101:7077’)
mt = hl.balding_nichols_model(n_populations=3,
n_samples=500,
n_variants=500_000,
n_partitions=32)
mt = mt.annotate_cols(drinks_coffee = hl.rand_bool(0.33))
gwas = hl.linear_regression_rows(y=mt.drinks_coffee,
x=mt.GT.n_alt_alleles(),
covariates=[1.0])
gwas.export(‘hdfs://genome101:9000/user/test/test.tsv’)
2021-12-29 13:40:35 Hail: INFO: merging 32 files totalling 42.8M…26 + 6) / 32]
2021-12-29 13:40:36 Hail: INFO: while writing:
hdfs://genome101:9000/user/test/test.tsv
merge time: 604.267ms


But it makes error to do it with a worker node

second test : master and worker node.
-code-

~/hadoop-3.3.0/sbin/start-dfs.sh ## master
jps ## master
92803 SecondaryNameNode
92293 NameNode
96950 Jps
92486 DataNode

jps ## worker
103746 Jps
103365 DataNode

master

sudo ~/spark-3.1.2/sbin/start-master.sh
sudo ~/spark-3.1.2/sbin/start-worker.sh spark://genome101:7077

worker

sudo ~/spark-3.1.2/sbin/start-worker.sh spark://genome101:7077

master

sudo python3
import hail as hl
hl.init(master=‘spark://genome101:7077’)
mt = hl.balding_nichols_model(n_populations=3,
n_samples=500,
n_variants=500_000,
n_partitions=32)
mt = mt.annotate_cols(drinks_coffee = hl.rand_bool(0.33))
gwas = hl.linear_regression_rows(y=mt.drinks_coffee,
x=mt.GT.n_alt_alleles(),
covariates=[1.0])
gwas.export(‘hdfs://genome101:9000/user/test/test.tsv’)


======================error message======================
Traceback (most recent call last):============================> (30 + 2) / 32]
File “”, line 1, in
File “”, line 2, in export
File “/usr/local/lib/python3.8/dist-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/hail/table.py”, line 1045, in export
Env.backend().execute(
File “/usr/local/lib/python3.8/dist-packages/hail/backend/py4j_backend.py”, line 110, in execute
raise e
File “/usr/local/lib/python3.8/dist-packages/hail/backend/py4j_backend.py”, line 86, in execute
result_tuple = self._jhc.backend().executeEncode(jir, stream_codec)
File “/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py”, line 1304, in call
return_value = get_return_value(
File “/usr/local/lib/python3.8/dist-packages/hail/backend/py4j_backend.py”, line 29, in deco
raise FatalError(‘%s\n\nJava stack trace:\n%s\n’
hail.utils.java.FatalError: HailException: Expected 32 part files but found 16

Java stack trace:
is.hail.utils.HailException: Expected 32 part files but found 16
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:11)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:11)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.io.fs.FS.copyMerge(FS.scala:264)
at is.hail.io.fs.FS.copyMerge$(FS.scala:238)
at is.hail.io.fs.HadoopFS.copyMerge(HadoopFS.scala:70)
at is.hail.utils.richUtils.RichRDD$.writeTable$extension(RichRDD.scala:118)
at is.hail.expr.ir.TableValue.export(TableValue.scala:138)
at is.hail.expr.ir.TableTextWriter.apply(TableWriter.scala:355)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:852)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Hail version: 0.2.78-2f7f9e231aaa
Error summary: HailException: Expected 32 part files but found 16

Anyone knows this problem?

Thank you

I Found the solution.

When hl.init(), set “tmp_dir” option

hl.init(‘spark://genome101:7077’,tmp_dir = "hdfs://genome101:9000/user/tmp/")

than, this error was soluted!

1 Like