Table export report error

Hi,

I rewrite hail0.1 code:
vds.sample_qc().export_samples('output.tsv', 'Sample = s, va.qc.*')
with hail0.2 code:

mt = hl.sample_qc(mt, name=‘sample_qc’)
table1 = mt.col.sample_qc
table1.export(‘gs://path/mt.tsv.bgz’)

I use gcp dataproc cluster.

hailctl dataproc start hail-test
–master-machine-type n1-highmem-8
–master-boot-disk-size 500
–num-workers 2
–worker-machine-type n1-highmem-16
–worker-boot-disk-size 500
–region europe-west1
–zone europe-west1-b
–max-idle 60m
–scopes cloud-platform

It report error at stage 1.

[Stage 1:> (0 + 8) / 26]ERROR

It report something like: Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Any idea about this?

Best, Shuang

We need the full stack trace to debug this

Hi Tim, thx for answering. Do I need to offer the job ID or do I need to copy/past the whole log?

quote=“shuang, post:1, topic:1540”]

[Stage 1:> (0 + 8) / 26]ERROR
[/quote]

We need everything after this, I think.

[Stage 1:> (0 + 16) / 26]Traceback (most recent call last):
File “/tmp/96a6f45ea5ac46d2a853ca7b7afe01f0/hl_step2_export_qc_metrics_tsv.py”, line 7, in
table1.export(‘gs://shuang/hail/step2_plot/chr20_qc.tsv.bgz’)
File “”, line 2, in export
File “/opt/conda/default/lib/python3.6/site-packages/hail/typecheck/check.py”, line 614, in wrapper
return original_func(*args, **kwargs)
File “/opt/conda/default/lib/python3.6/site-packages/hail/expr/expressions/base_expression.py”, line 944, in export
ds.export(output=path, delimiter=delimiter, header=header)
File “”, line 2, in export
File “/opt/conda/default/lib/python3.6/site-packages/hail/typecheck/check.py”, line 614, in wrapper
return original_func(*args, **kwargs)
File “/opt/conda/default/lib/python3.6/site-packages/hail/table.py”, line 1038, in export
ir.TableWrite(self._tir, ir.TableTextWriter(output, types_file, header, parallel, delimiter)))
File “/opt/conda/default/lib/python3.6/site-packages/hail/backend/spark_backend.py”, line 296, in execute
result = json.loads(self._jhc.backend().executeJSON(jir))
File “/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/opt/conda/default/lib/python3.6/site-packages/hail/backend/spark_backend.py”, line 41, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: SparkException: Job aborted due to stage failure: Task 20 in stage 1.0 failed 20 times, most recent failure: Lost task 20.19 in stage 1.0 (TID 1037, hail-test-w-1.c.sequencing-informatics-201511.internal, executor 37): ExecutorLostFailure (executor 37 exited caused by one of the running tasks) Reason: Container from a bad node: container_1595230761411_0001_01_000041 on host: hail-test-w-1.c.sequencing-informatics-201511.internal. Exit status: 134. Diagnostics: [2020-07-20 08:30:00.237]Exception from container-launch.
Container id: container_1595230761411_0001_01_000041
Exit code: 134

[2020-07-20 08:30:00.238]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

[2020-07-20 08:30:00.239]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

.
Driver stacktrace:

Java stack trace:
java.lang.RuntimeException: error while applying lowering ‘InterpretNonCompilable’
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:26)
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:18)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:18)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:317)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:304)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:303)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:20)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:18)
at is.hail.utils.package$.using(package.scala:601)
at is.hail.annotations.Region$.scoped(Region.scala:18)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:18)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:229)
at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:303)
at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:323)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 1.0 failed 20 times, most recent failure: Lost task 20.19 in stage 1.0 (TID 1037, hail-test-w-1.c.sequencing-informatics-201511.internal, executor 37): ExecutorLostFailure (executor 37 exited caused by one of the running tasks) Reason: Container from a bad node: container_1595230761411_0001_01_000041 on host: hail-test-w-1.c.sequencing-informatics-201511.internal. Exit status: 134. Diagnostics: [2020-07-20 08:30:00.237]Exception from container-launch.
Container id: container_1595230761411_0001_01_000041
Exit code: 134

[2020-07-20 08:30:00.238]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

[2020-07-20 08:30:00.239]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1892)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1880)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1879)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2113)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2062)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2051)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
at is.hail.rvd.RVD.combine(RVD.scala:723)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:842)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:53)
at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)
at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:39)
at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:50)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:69)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:69)
at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:45)
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:20)
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:18)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:18)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:317)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:304)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:303)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:20)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:18)
at is.hail.utils.package$.using(package.scala:601)
at is.hail.annotations.Region$.scoped(Region.scala:18)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:18)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:229)
at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:303)
at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:323)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

Hail version: 0.2.45-a45a43f21e83
Error summary: SparkException: Job aborted due to stage failure: Task 20 in stage 1.0 failed 20 times, most recent failure: Lost task 20.19 in stage 1.0 (TID 1037, hail-test-w-1.c.sequencing-informatics-201511.internal, executor 37): ExecutorLostFailure (executor 37 exited caused by one of the running tasks) Reason: Container from a bad node: container_1595230761411_0001_01_000041 on host: hail-test-w-1.c.sequencing-informatics-201511.internal. Exit status: 134. Diagnostics: [2020-07-20 08:30:00.237]Exception from container-launch.
Container id: container_1595230761411_0001_01_000041
Exit code: 134

[2020-07-20 08:30:00.238]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

[2020-07-20 08:30:00.239]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7576 Aborted /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx37237m ‘-Xss4M’ -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/tmp ‘-Dspark.driver.port=40743’ ‘-Dspark.rpc.message.maxSize=512’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041 -XX:OnOutOfMemoryError=‘kill %p’ org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@hail-test-m.c.sequencing-informatics-201511.internal:40743 --executor-id 37 --hostname hail-test-w-1.c.sequencing-informatics-201511.internal --cores 8 --app-id application_1595230761411_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/app.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1595230761411_0001/container_1595230761411_0001_01_000041/hail-all-spark.jar > /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stdout 2> /var/log/hadoop-yarn/userlogs/application_1595230761411_0001/container_1595230761411_0001_01_000041/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

.
Driver stacktrace:

[Stage 1:> (0 + 8) / 26]
Job output is complete

I think the first thing to try is updating to the latest Hail – error code 134 is usually out-of-memory exceptions and we fixed a few memory leaks recently.

Hi @tpoterba I upgraded to Hail version: 0.2.49-11ae8408bad0
but It still report really similar error message

Exit status: 134. Diagnostics: [2020-07-20 12:58:30.235]Exception from container-launch.
Container id: container_1595248992147_0001_01_000037
Exit code: 134

Hi, Shuang! Have you resolved this by running Hail 0.2.52?

I noticed that the origin of this thread was using Hail 0.1. We no longer support Hail 0.1