Pip-installed Hail requires additional configuration options in Spark referring to the path to the Hail Python module directory HAIL_DIR

Hi

I build hail 0.2.95, and it is OK.

I run pyspark with jupyter notebook

hl.init(sc=sc)

The message is out this.

pip-installed Hail requires additional configuration options in Spark referring
to the path to the Hail Python module directory HAIL_DIR,
e.g. /path/to/python/site-packages/hail:
spark.jars=HAIL_DIR/backend/hail-all-spark.jar
spark.driver.extraClassPath=HAIL_DIR/backend/hail-all-spark.jar
spark.executor.extraClassPath=./hail-all-spark.jarTraceback (most recent call last):

This is pyspark cmd that I run.

export SPARK_HOME=/home1/sshuser/spark-3.2.0-bin-without-hadoop
export SPARK_CONF_DIR=/home1/sshuser/spark-3.2.0-bin-without-hadoop/conf
export SPARK_SUBMIT_OPTS=“-Dhdp.version=3.1.0.0-78”
export PATH=$SPARK_HOME/bin:$PATH
export SPARK_DIST_CLASSPATH=$HADOOP_COMMON_HOME/bin/hadoop classpath
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=‘notebook --no-browser --ip=0.0.0.0 --port 8888’

pyspark --conf spark.jars=/home1/sshuser/.local/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
–conf spark.driver.extraClassPath=/home1/sshuser/.local/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
–conf spark.executor.extraClassPath=./hail-all-spark.jarTraceback
–conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78
–conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78
–conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro
–conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro

Thanks

In CentOS7.8

sudo yum install python3

is Python 3.6.

And then I built Python 3.7 source.

After then, something was wrong with a client for the Spark cluster.

I rebuild the Spark cluster and client.

Skip sudo yum install python3. Just make install Python 3.7 from source.

hail initialize OK!

Thanks!

1 Like