Hi there,
Yeah, this is pretty confusing. spark-shell --help says:
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
But, apparently, this does not include the local jars on the driver and executor class paths. You have to explicitly add the JARs to the class paths using these properties:
spark-shell --jars './hail.jar' \
--conf='spark.sql.files.openCostInBytes=53687091200' \
--conf='spark.sql.files.maxPartitionBytes=53687091200' \
--conf='spark.driver.extraClassPath=./hail.jar' \
--conf='spark.executor.extraClassPath=./hail.jar'
NB: It is important to note that the JARs are copied to the working directory of the executors, but are not copied to the working directory of the driver. Usually, the spark.driver.extraClassPath will be the same path you passed to --jars whereas spark.executor.extraClassPath must be a relative path.
This error sometimes manifests as:
ClassNotFoundException: is.hail.utils.SerializableHadoopConfiguration