Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in export_variants
File “/home/ec2-user/BuildAgent/work/c38e75e72b769a7c/python/hail/java.py”, line 112, in handle_py4j
hail.java.FatalError: ClassNotFoundException: is.hail.asm4s.AsmFunction2
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9, joel-cluster-5-w-0.c.joel-billing-1.internal): java.lang.NoClassDefFound
Error: is/hail/asm4s/AsmFunction2
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:254)
at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:250)
at is.hail.asm4s.package$.loadClass(package.scala:261)
at is.hail.asm4s.FunctionBuilder$$anon$2.apply(FunctionBuilder.scala:218)
at is.hail.expr.CM$$anonfun$runWithDelayedValues$1.apply(CM.scala:82)
at is.hail.expr.CM$$anonfun$runWithDelayedValues$1.apply(CM.scala:80)
at is.hail.expr.Parser$$anonfun$is$hail$expr$Parser$$evalNoTypeCheck$1.apply(Parser.scala:53)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$2$$anonfun$apply$2.apply$mcV$sp(Parser.scala:228)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$3$$anonfun$apply$15.apply(Parser.scala:239)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$3$$anonfun$apply$15.apply(Parser.scala:239)
I believe I am passing JARs and setting PYTHONPATH correctly. Other commands I’ve tried so far are working. Hail version is 53e9d33.
I think the Hail jar isn’t visible to the Spark workers. If that’s true, then you’ll be able to successfully run methods that (a) only run locally (no spark) or (b) run a Spark job but only uses Spark classes. Any method that uses Hail classes in Spark jobs will fail.
Dataproc installs any additional jars when you call out to gcloud dataproc jobs submit. Are you using Dataproc or GCE? You may need to download the jar to each executor explicitly.
Thanks Tim. I am setting the spark.jars conf entry to include the Hail JAR. Do I need to do more than that? My understanding was that this took care of distributing the JARs to executors. (This is on Dataproc.)
I’m pretty sure you’re hitting an infuriating, poorly documented “feature” of --jars / spark.jars that I mention in another discuss post. Short answer: set these two configuration parameters appropriately for your worker nodes:
I sympathize that this is all either not documented or scattered in hard to access areas. This SO post collects some of the tribal knowledge about getting JARs to the right place.
I’m not sure why the third command seems to trigger a Spark execution when the first two commands do not.
I would expect all there commands to not trigger execution, thus delaying the error until something like export_variants. I suspect that, for some reason, the filter_genotypes is triggering an execution which triggers the error. When testing, I generally follow every hail command that produces a new VDS with .summarize() which always triggers an execution. If my theory is correct, adding summarize would trigger the error on the filter_variants_expr line.
I think the issue is that your spark.driver.extraClassPath is now relative, but the files are not copied to the working directory of the driver, only for each of the executors. I would try setting spark.driver.extraClassPath back to the absolute path.
Often, when giving examples, people start with the jars in the working directory of the driver, so both the spark.jars and the spark.driver.extraClassPath are relative paths. In your case, the jars on the driver are not in your working directory, so you want an absolute path for spark.driver.extraClassPath.
Deployed the latest pre-built jar for Spark 2.0.2 with Databricks causes the same error.
.summerize() works fine however vds.query_variants(‘variants.take(5)’), vds.query_samples(‘samples.take(5)’) throw exception.
Java stack trace:
java.lang.NoClassDefFoundError: is/hail/asm4s/AsmFunction2
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:254)
at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:250)
at is.hail.asm4s.package$.loadClass(package.scala:261)
at is.hail.asm4s.FunctionBuilder$$anon$2.apply(FunctionBuilder.scala:218)
…
When a libraray is attached to a cluster executor.extraClassPath is set but there is no way in Databricks to verify if the jar actually exists in the path, so create an init script to manually copy the jars to all executors.
Hi, I am getting the same error. It happens when the following function is run:
vds = vds.repartition(1000, shuffle=True)
I verified that the jar is present in the path and it is supplied as absolute path. There is nothing else in ‘spark.driver.extraClassPath’, just the path to the ‘hail-all-spark.jar’. I am using ‘Spark 2.2.1’ and old ‘Hail 0.1’. Then the cloud service is AWS EMR.
Could anyone help with that? I see there is some kind of manual copying that @snudurupati proposed but I am not sure what it is, where to put it, how the paths should be modified. So, I have not yet tried that.