Is there a recommend Hail 0.2 commit version?


#1

Hi,

Is there a recommended Hail commit version? At the most basic, I’d like to load a VCF and run a logistic regression on dosage (or the DS format field). I’ve been able to compile and run Hail on Amazon’s EMR using commit version 1a0759237, but that version behaves differently than the databricks tutorial, for example mt.rows().select().show(5) shows an empty table, and I don’t get any plots when trying
p = hl.plot.histogram(mt.DP, range=(0,30), bins=30, title='DP Histogram', legend='DP')

I’ve tried compiling that version (f2b0dca9f506), but then it fails when trying to run mt.rows().select().show(5), as I recall, with a message like: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times.

I tried compiling with Spark 2.3.0 but that doesn’t seem supported. I also tried a recent version, but that had a compile error.

Thanks.


#2

what’s the full error message?

Spark 2.3.0 should work, yes, but you’ll need to pass a few flags into the compilation:


#3

After compiling devel-f2b0dca9f506 for spark 2.3.0 using those settings, I ran the mt.rows().select().show(5) command again, but it still failed with a long error message (below). Any suggestions?

`---------------------------------------------------------------------------
FatalError Traceback (most recent call last)
in ()
----> 1 mt.rows().select().show(5)

~/hail-python.zip/hail/typecheck/check.py in wrapper(*args, **kwargs)
545 def wrapper(*args, **kwargs):
546 args_, kwargs_ = check_all(f, args, kwargs, checkers, is_method=is_method)
–> 547 return f(*args_, **kwargs_)
548
549 update_wrapper(wrapper, f)

~/hail-python.zip/hail/table.py in show(self, n, width, truncate, types)
1215 Print an extra header line with the type of each field.
1216 “”"
-> 1217 print(self._show(n,width, truncate, types))
1218
1219 def _show(self, n=10, width=90, truncate=None, types=True):

~/hail-python.zip/hail/table.py in _show(self, n, width, truncate, types)
1218
1219 def _show(self, n=10, width=90, truncate=None, types=True):
-> 1220 return self._jt.showString(n, joption(truncate), types, width)
1221
1222 def index(self, *exprs):

/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in call(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:

~/hail-python.zip/hail/utils/java.py in deco(*args, **kwargs)
198 raise FatalError(’%s\n\nJava stack trace:\n%s\n’
199 ‘Hail version: %s\n’
–> 200 ‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
201 except pyspark.sql.utils.CapturedException as e:
202 raise FatalError(’%s\n\nJava stack trace:\n%s\n’

FatalError: SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, ip-172-31-22-223.us-west-2.compute.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1536616256209_0001_01_000005 on host: ip-172-31-22-223.us-west-2.compute.internal. Exit status: 139. Diagnostics: Exception from container-launch.
Container id: container_1536616256209_0001_01_000005
Exit code: 139
Exception message: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

Stack trace: ExitCodeException exitCode=139: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 139

Driver stacktrace:

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, ip-172-31-22-223.us-west-2.compute.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1536616256209_0001_01_000005 on host: ip-172-31-22-223.us-west-2.compute.internal. Exit status: 139. Diagnostics: Exception from container-launch.
Container id: container_1536616256209_0001_01_000005
Exit code: 139
Exception message: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

Stack trace: ExitCodeException exitCode=139: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 139

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1750)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1738)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1737)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1737)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1971)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1920)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1909)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:143)
at is.hail.rvd.OrderedRVD$.getPartitionKeyInfo(OrderedRVD.scala:610)
at is.hail.rvd.OrderedRVD$.makeCoercer(OrderedRVD.scala:705)
at is.hail.io.vcf.MatrixVCFReader.coercer$lzycompute(LoadVCF.scala:975)
at is.hail.io.vcf.MatrixVCFReader.coercer(LoadVCF.scala:975)
at is.hail.io.vcf.MatrixVCFReader.apply(LoadVCF.scala:1007)
at is.hail.expr.ir.MatrixRead.execute(MatrixIR.scala:414)
at is.hail.expr.ir.MatrixRowsTable.execute(TableIR.scala:748)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:454)
at is.hail.table.Table.value$lzycompute(Table.scala:215)
at is.hail.table.Table.value(Table.scala:213)
at is.hail.table.Table.x$5$lzycompute(Table.scala:218)
at is.hail.table.Table.x$5(Table.scala:218)
at is.hail.table.Table.rvd$lzycompute(Table.scala:218)
at is.hail.table.Table.rvd(Table.scala:218)
at is.hail.table.Table.take(Table.scala:649)
at is.hail.table.Table.showString(Table.scala:685)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Hail version: devel-f2b0dca9f506
Error summary: SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, ip-172-31-22-223.us-west-2.compute.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1536616256209_0001_01_000005 on host: ip-172-31-22-223.us-west-2.compute.internal. Exit status: 139. Diagnostics: Exception from container-launch.
Container id: container_1536616256209_0001_01_000005
Exit code: 139
Exception message: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

Stack trace: ExitCodeException exitCode=139: /bin/bash: line 1: 9276 Segmentation fault LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5120m ‘-verbose:gc’ ‘-XX:+PrintGCDetails’ ‘-XX:+PrintGCDateStamps’ ‘-XX:+UseConcMarkSweepGC’ ‘-XX:CMSInitiatingOccupancyFraction=70’ ‘-XX:MaxHeapFreeRatio=70’ ‘-XX:+CMSClassUnloadingEnabled’ ‘-XX:OnOutOfMemoryError=kill -9 %p’ -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/tmp ‘-Dspark.history.ui.port=18080’ ‘-Dspark.driver.port=38171’ -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-31-16-168.us-west-2.compute.internal:38171 --executor-id 4 --hostname ip-172-31-22-223.us-west-2.compute.internal --cores 4 --app-id application_1536616256209_0001 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/app.jar --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1536616256209_0001/container_1536616256209_0001_01_000005/hail-all-spark.jar > /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stdout 2> /var/log/hadoop-yarn/containers/application_1536616256209_0001/container_1536616256209_0001_01_000005/stderr

at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 139

Driver stacktrace:


#4

could you try updating to the latest version, recompiling, and running again? I’m sorry! We have fixed a few segfaults in the last month so I want to be sure we’re not spending time double-debugging


#5

This also seems bad. Once you update to current version, let’s fix all the problems that remain.


#6

Hi Tim,

I just noticed that a new hail folder was added to the repo, so I need to update my scripts and reflect that change. Also, I try to compile hail using Spark 2.3.0 but it’s not working anymore:

Successfully started process 'command 'make''
tar -xzf libsimdpp-2.1.tar.gz
g++ -o build/NativeBoot.o -march=sandybridge -O3 -std=c++11 -Ilibsimdpp-2.1 -Wall -Werror -fPIC -ggdb -fno-strict-aliasing -I../resources/include -I/etc/alternatives/jre/include -I/etc/alternatives/jre/include/linux -c NativeBoot.cpp
NativeBoot.cpp:1:0: error: bad value (sandybridge) for -march= switch
 #include <jni.h>
 ^
make: *** [build/NativeBoot.o] Error 1
:nativeLib FAILED
:nativeLib (Thread[main,5,main]) completed. Took 0.09 secs.

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':nativeLib'.
> Process 'command 'make'' finished with non-zero exit value 2

IDK if it’s not finding the JAVA libraries. I double checked my $JAVA_HOME and it in my .bashrc, any thoughts?
Thank you,
Carlos


#7

ah, did we not update all our dev docs? That was a breaking change and we should have said that


#8

we’re moving toward a monorepo for our (currently separate) other projects too, so need to add a directory


#9

I get a similar issue when using AWS instance store instances. I don’t get the issue with EBS instances. I haven’t put time to find out why (may be the file system). I hope this helps.


#10

This is a problem compiling the C libraries. Richard can you chime in here?


#11

There is a related link at the github repository: https://github.com/hail-is/hail/pull/4317


#12

Which EC2 instance types have you used successfully? I’ve been trying m3.xlarge so far.


#13

I get the same issue with Spark 2.3.0 and 2.2.0.


#14

okay, looks like we need to wait for this PR to go in


#15

m3’s are instance stores: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html

Any EBS should work fine. I use c4’s all the time (i.e. c4.4xlarge). They work well on demand and as spot.


#16

The commit which should fix this just got merged this evening. The problem arises because g+±4.8.x doesn’t recognize “-march=sandybridge”, it wants “-march=corei7-avx” instead. In theory you should be able to work around it by setting “export CXXFLAGS=-march=native” before building


#17

Works like a charm! Thank you!


#18

Great! I can run the gwas tutorial (I tried up to the linear regression part) using the new version, c6941a8e04e7, on Spark 2.2.0 on Amazon EMR emr-5.10.0 using c4.xlarge instances.