Sorry for the long delay on my reply, @atebbe
Let’s recall your spark class path settings:
spark.driver.extraClassPath …:./hail-all-spark.jar
spark.executor.extraClassPath …:./hail-all-spark.jar
These assert that, on both the driver and the executors, the jar is located in the working directory of the driver process. If you ssh to one of your executors and find the spark job working directory (try looking in /var/run/spark/work
), I suspect you will not find hail-all-spark.jar
in that directory. While you’re at it, can you open a terminal in your Jupyter notebook and verify that the hail-all-spark.jar
is indeed in the working directory of your executor?
This StackOverflow post suggests that addFile
is inappropriate for “runtime dependencies”.
So. Assuming the jar is indeed missing from the working directory of your executors, we need to figure out how to get it there.
First, try sc._jsc.addJar
instead of sc.addFile
.
If that fails, Apache Toree
suggests using the %AddJar
magics invocation to add a jar.