ClassNotFoundException: is.hail.asm4s.AsmFunction2

Hello, your friendly FireCloud integrationist again.

I’m going through the Getting Started guide to check that I’m not missing anything in my distribution, and I’m seeing an error on this step:

vds.export_variants('gs://joel-jupyter-test-1/variantqc.tsv', 'Variant = v, va.qc.*')

Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in export_variants
File “/home/ec2-user/BuildAgent/work/c38e75e72b769a7c/python/hail/java.py”, line 112, in handle_py4j
hail.java.FatalError: ClassNotFoundException: is.hail.asm4s.AsmFunction2

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 9, joel-cluster-5-w-0.c.joel-billing-1.internal): java.lang.NoClassDefFound
Error: is/hail/asm4s/AsmFunction2
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:254)
at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:250)
at is.hail.asm4s.package$.loadClass(package.scala:261)
at is.hail.asm4s.FunctionBuilder$$anon$2.apply(FunctionBuilder.scala:218)
at is.hail.expr.CM$$anonfun$runWithDelayedValues$1.apply(CM.scala:82)
at is.hail.expr.CM$$anonfun$runWithDelayedValues$1.apply(CM.scala:80)
at is.hail.expr.Parser$$anonfun$is$hail$expr$Parser$$evalNoTypeCheck$1.apply(Parser.scala:53)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$2$$anonfun$apply$2.apply$mcV$sp(Parser.scala:228)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$3$$anonfun$apply$15.apply(Parser.scala:239)
at is.hail.expr.Parser$$anonfun$parseNamedExprs$3$$anonfun$apply$15.apply(Parser.scala:239)

I believe I am passing JARs and setting PYTHONPATH correctly. Other commands I’ve tried so far are working. Hail version is 53e9d33.

I think the Hail jar isn’t visible to the Spark workers. If that’s true, then you’ll be able to successfully run methods that (a) only run locally (no spark) or (b) run a Spark job but only uses Spark classes. Any method that uses Hail classes in Spark jobs will fail.

Dataproc installs any additional jars when you call out to gcloud dataproc jobs submit. Are you using Dataproc or GCE? You may need to download the jar to each executor explicitly.

Thanks Tim. I am setting the spark.jars conf entry to include the Hail JAR. Do I need to do more than that? My understanding was that this took care of distributing the JARs to executors. (This is on Dataproc.)

spark.jars /spark/spark-2.0.2/gcs-connector.jar,/hail/hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.submit.pyFiles /hail/pyhail-hail-is-master-53e9d33.zip
spark.sql.files.maxPartitionBytes=100000000000
spark.sql.files.openCostInBytes=100000000000

filter_variants_expr is failing in the same way, but other functions like import_vcf, read, write are working.

oh, I think that should work. I’m not entirely sure what’s going on, then - we’ll need to go a bit deeper.

I’m pretty sure you’re hitting an infuriating, poorly documented “feature” of --jars / spark.jars that I mention in another discuss post. Short answer: set these two configuration parameters appropriately for your worker nodes:

spark.driver.extraClassPath = ./hail.jar
spark.executor.extraClassPath = ./hail.jar
1 Like

Thanks Dan. Unfortunately that did not resolve my problem.

spark.jars /spark/spark-2.0.2/gcs-connector.jar,/hail/hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.submit.pyFiles /hail/pyhail-hail-is-master-53e9d33.zip
spark.driver.extraClassPath /spark/spark-2.0.2/gcs-connector.jar,/hail/hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.executor.extraClassPath /spark/spark-2.0.2/gcs-connector.jar,/hail/hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.sql.files.maxPartitionBytes=100000000000
spark.sql.files.openCostInBytes=100000000000

A couple of things:

    spark.executor.extraClassPath ./gcs-connectors.jar:./hail-hail-is-master-all-spark2.0.2-53e9d33.jar

I sympathize that this is all either not documented or scattered in hard to access areas. This SO post collects some of the tribal knowledge about getting JARs to the right place.

1 Like

Perfect! That solved my problem. Thanks Dan!

Here’s my new spark-defaults.conf (it’s rendering comments as headers):

Uses the YARN configuration in $HADOOP_CONF_DIR

spark.master yarn
spark.submit.deployMode client

Distributes from master node to the working directories of executors

spark.jars /spark/spark-2.0.2/gcs-connector.jar,/hail/hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.submit.pyFiles /hail/pyhail-hail-is-master-53e9d33.zip

Adds JARs to Classpaths

spark.driver.extraClassPath ./gcs-connector.jar:./hail-hail-is-master-all-spark2.0.2-53e9d33.jar
spark.executor.extraClassPath ./gcs-connector.jar:./hail-hail-is-master-all-spark2.0.2-53e9d33.jar

Hail needs at least 50GB

spark.sql.files.maxPartitionBytes=100000000000
spark.sql.files.openCostInBytes=100000000000

…but now it’s failing again with the same error.

vds2 = vds.filter_variants_expr(‘v.altAllele().isSNP() && va.qc.gqMean >= 20’)
vds3 = vds2.filter_samples_expr(‘sa.qc.callRate >= 0.97 && sa.qc.dpMean >= 15’)

These run, but filter_genotypes fails.

vds4 = vds3.filter_genotypes('let ab = g.ad[1] / g.ad.sum() in ’
‘((g.isHomRef() && ab <= 0.1) || ’
’ (g.isHet() && ab >= 0.25 && ab <= 0.75) || ’
’ (g.isHomVar() && ab >= 0.9))’)

I’m not sure why the third command seems to trigger a Spark execution when the first two commands do not.

I would expect all there commands to not trigger execution, thus delaying the error until something like export_variants. I suspect that, for some reason, the filter_genotypes is triggering an execution which triggers the error. When testing, I generally follow every hail command that produces a new VDS with .summarize() which always triggers an execution. If my theory is correct, adding summarize would trigger the error on the filter_variants_expr line.

I think the issue is that your spark.driver.extraClassPath is now relative, but the files are not copied to the working directory of the driver, only for each of the executors. I would try setting spark.driver.extraClassPath back to the absolute path.

Often, when giving examples, people start with the jars in the working directory of the driver, so both the spark.jars and the spark.driver.extraClassPath are relative paths. In your case, the jars on the driver are not in your working directory, so you want an absolute path for spark.driver.extraClassPath.

Saving the day once again, Dan! That worked. Thanks for the summarize() tip as well.

Deployed the latest pre-built jar for Spark 2.0.2 with Databricks causes the same error.
.summerize() works fine however vds.query_variants(‘variants.take(5)’), vds.query_samples(‘samples.take(5)’) throw exception.
Java stack trace:
java.lang.NoClassDefFoundError: is/hail/asm4s/AsmFunction2
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
at is.hail.asm4s.package$HailClassLoader$.liftedTree1$1(package.scala:254)
at is.hail.asm4s.package$HailClassLoader$.loadOrDefineClass(package.scala:250)
at is.hail.asm4s.package$.loadClass(package.scala:261)
at is.hail.asm4s.FunctionBuilder$$anon$2.apply(FunctionBuilder.scala:218)

Followed this Databricks forum post as well
https://forums.databricks.com/questions/11391/can-i-run-hail-on-databricks.html

So I created an init script, that didn’t help either.

dbutils.fs.put("/databricks/init/install_hail.sh",""" #!/bin/bash
cp dbfs:/FileStore/jars/cafb4367_0ff4_4871_b28a_2928d77b0cfc-hail_all_spark2_0_2-f864f.jar /mnt/driver-daemon/jars/
cp dbfs:/FileStore/jars/cafb4367_0ff4_4871_b28a_2928d77b0cfc-hail_all_spark2_0_2-f864f.jar /mnt/jars/driver-daemon""", True)

What are your spark.driver.extraClassPath and spark.executor.extraClassPath set to?

spark.executor.extraClassPath - /databricks/jars/<YOUR_JAR_NAME>-hail_all_spark2_0_2-f864f.jar

When a libraray is attached to a cluster executor.extraClassPath is set but there is no way in Databricks to verify if the jar actually exists in the path, so create an init script to manually copy the jars to all executors.

dbutils.fs.put("/databricks/init/install_hail.sh",""" #!/bin/bash cp dbfs:/FileStore/jars/<YOUR_JAR_NAME>-hail_all_spark2_0_2-f864f.jar /mnt/driver-daemon/jars/ cp dbfs:/FileStore/jars/<YOUR_JAR_NAME>-hail_all_spark2_0_2-f864f.jar /mnt/jars/driver-daemon""", True)

This seems to do that trick, thanks for the pointer @tpoterba!

Hi, I am getting the same error. It happens when the following function is run:

vds = vds.repartition(1000, shuffle=True)

I verified that the jar is present in the path and it is supplied as absolute path. There is nothing else in ‘spark.driver.extraClassPath’, just the path to the ‘hail-all-spark.jar’. I am using ‘Spark 2.2.1’ and old ‘Hail 0.1’. Then the cloud service is AWS EMR.

Could anyone help with that? I see there is some kind of manual copying that @snudurupati proposed but I am not sure what it is, where to put it, how the paths should be modified. So, I have not yet tried that.

the development team is not supporting 0.1 anymore, so I’d recommend switching to 0.2 when possible for you.

This error means you probably need something like the following:

spark.jars=HAIL_JAR_PATH
spark.driver.extraClassPath=HAIL_JAR_PATH
spark.executor.extraClassPath=./hail-all-spark.jar
1 Like

Awesome! That fixed everything. Thanks a lot!