is.hail.kryo.HailKryoRegistrator ClassNotFoundException

Sorry for the long delay on my reply, @atebbe

Let’s recall your spark class path settings:

spark.driver.extraClassPath …:./hail-all-spark.jar
spark.executor.extraClassPath …:./hail-all-spark.jar

These assert that, on both the driver and the executors, the jar is located in the working directory of the driver process. If you ssh to one of your executors and find the spark job working directory (try looking in /var/run/spark/work), I suspect you will not find hail-all-spark.jar in that directory. While you’re at it, can you open a terminal in your Jupyter notebook and verify that the hail-all-spark.jar is indeed in the working directory of your executor?

This StackOverflow post suggests that addFile is inappropriate for “runtime dependencies”.

So. Assuming the jar is indeed missing from the working directory of your executors, we need to figure out how to get it there.

First, try sc._jsc.addJar instead of sc.addFile.

If that fails, Apache Toree suggests using the %AddJar magics invocation to add a jar.