Spark 2.4.4 gets stuck in initialization phase

NLSVTN · November 19, 2019, 7:44pm

Hi,

I am working on ‘RHEL Server release 7.7’. I installed the recommended ‘miniconda3’, then created python 3.7 virtual environment:

conda create -n py37 python=3.7

Then I installed hail using pip the way it is recommended:

pip install hail

py-spark and ipython:

conda install -c conda-forge pyspark
conda install -c anaconda ipython

I have also downloaded ‘Spark2.4.4’, started one master and one slave. Then, I tried to run the basic script in various ways:

import hail as hl
mt = hl.balding_nichols_model(n_populations=3, n_samples=50, n_variants=100)
mt.count()

But it just gets stuck on the second line in Hail and Spark init and does not go anywhere further. No log output, no error.

Whether I set $SPARK_HOME or not, it does not fix it. I also set the path to HAIL jar directly when running it with spark-submit but the result is the same:

spark-submit --master spark://ai-grisnodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf spark.executor.extraClassPath=./hail-all-spark.jar test_hail.py

Or

spark-submit --master spark://ai-grisnodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf spark.executor.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar test_hail.py

test_hail.py just contains the 3 lines of the sample code.

tpoterba · November 19, 2019, 7:50pm

Can you run Spark pipelines that don’t involve Hail?

NLSVTN · November 19, 2019, 8:42pm

I launched spark-shell, loaded a file from Hadoop and counted the number of lines in it, working fine, or how else could I test it?

NLSVTN · November 19, 2019, 10:29pm

Ok, we figured it out. The issue was that we do not have that much space on the partition where Hadoop and spark logs were written. After redirection of the logs to a different place on the node, it started to work. There was also a ‘java cp’ process that was triggered by the ‘spark-submit’ and it was filling up Hadoop log with one error over and over again:

Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resource s, the NN will immediately return to safe mode. Use “hdfs dfsadmin -safemode leave” to turn safe mode off.
1808697 2019-11-19 16:22:20,204 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs fro m 137.187.60.61:44398 Call#33705121 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hail.wDXS6L3AD3Ta. Name node is in safe mode.

It is still rather vague what was the issue, but that is just most relevant.

tpoterba · November 19, 2019, 11:55pm

huh, weird! Glad you’re unblocked.

Topic		Replies	Views
Unable to initialize hail - pyspark - py4J error Hail Query & hailctl	3	2141	June 24, 2020
[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError Hail Query & hailctl	2	524	July 14, 2021
Hl.init() stopped working Hail Query & hailctl	4	658	February 26, 2021
Pip-installed Hail requires additional configuration options in Spark referring to the path to the Hail Python module directory HAIL_DIR Hail Batch & General Cloud	1	654	June 3, 2022
Spark Exception when running init() Hail Query & hailctl	9	47	October 9, 2024

Spark 2.4.4 gets stuck in initialization phase

Related topics