Hi there,
Yeah, this is pretty confusing. spark-shell --help
says:
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
But, apparently, this does not include the local jars on the driver and executor class paths. You have to explicitly add the JARs to the class paths using these properties:
spark-shell --jars './hail.jar' \
--conf='spark.sql.files.openCostInBytes=53687091200' \
--conf='spark.sql.files.maxPartitionBytes=53687091200' \
--conf='spark.driver.extraClassPath=./hail.jar' \
--conf='spark.executor.extraClassPath=./hail.jar'
NB: It is important to note that the JARs are copied to the working directory of the executors, but are not copied to the working directory of the driver. Usually, the spark.driver.extraClassPath
will be the same path you passed to --jars
whereas spark.executor.extraClassPath
must be a relative path.
This error sometimes manifests as:
ClassNotFoundException: is.hail.utils.SerializableHadoopConfiguration