Connect hail to master spark server on kubernetes

edg1983 · May 10, 2022, 3:28pm

Hi,

On our cluster I have a spark server running on kubernetes to which I usually connect in pyspark by setting up a Spark.Context with master pointing to the corresponding address (like spark://master_address:7077).

It is not clear to me how to install hail in our spark server and then connect to it from my python scripts and notebooks.

From the documentation, I think we first need to compile hail on spark server using install-on-cluster. Do we need to do any additional configuration after running make install-on-cluster?
Any recommendation on how to do this when the spark server is managed by kubernetes?

Once hail is installed in the spark server (let’s say the spark master address is 10.10.10.10:7077), how can I proceed to connect hail to this server?
Is it enough to create a pyspark Spark Context like this
conf.setAll(
[(‘spark.driver.memory’, ‘16g’),
(‘spark.driver.cores’, ‘4’),
(‘spark.executor.memory’, ‘8g’),
(‘spark.executor.instances’,‘10’),
(‘spark.executor.cores’, ‘4’),
(‘spark.driver.extraClassPath’, ‘$HAIL_HOME/hail-all-spark.jar’),
(‘spark.executor.extraClassPath’, ‘$HAIL_HOME/hail-all-spark.jar’),
(‘spark.serializer’, ‘org.apache.spark.serializer.KryoSerializer’),
(‘spark.kryo.registrator’, 'is.hail.kryo.HailKryoRegistrator ')]
)

sc=SparkContext(master=“spark://10.10.10.10:7077”, appName=“hail”, conf=conf)

and then pass this to hail.init(sc)? Or I have to do something else?

Thanks for support!

danking · May 11, 2022, 3:30pm

I suspect the easiest thing to do is to run the install-on-cluster command in the Dockerfile that you use to generate the Docker image for your master pod.

To initialize Hail, I would just use hl.init(master='spark://...') If you need to pass special spark configuration, you can use spark_conf=.... You don’t need to set the class paths, the serializer, or the Kryo registration if you let hl.init create its own SparkContext.

edg1983 · May 13, 2022, 8:34am

Hi!
Thanks for the suggestion. I’ve asked admin to install hail into the docker image they use for the spark server.
On the client side, so on my machine where I want to launch the analysis, I currently have hail installed in a conda env using pip. Is this OK to run an hail isntance connected to the spark server using hl.init(master='spark://...') as you suggested?
Or I also need to build hail with install-on-cluster mode?

Thanks!

danking · May 13, 2022, 1:41pm

I’ve never done what you’ve described, but, yes, I believe that should work just fine with a pip-installed Hail.

Topic		Replies	Views
How to install hail on spark cluster Hail Query & hailctl	13	1659	September 15, 2020
Initialise Hail with existing Spark Hail Query & hailctl	3	527	May 9, 2023
Docker image on spark cluster Hail Query & hailctl	0	605	December 30, 2020
Unable to initialize hail - pyspark - py4J error Hail Query & hailctl	3	2141	June 24, 2020
Install Hail using Spark Hail Query & hailctl	15	1381	April 13, 2018

Connect hail to master spark server on kubernetes

Related topics