Initialise Hail with existing Spark

ireneisdoomed · May 9, 2023, 12:16pm

Hi guys!

I am having problems initialising Hail on my testing environment, where there is an existing Spark Context.

I have tried to follow the documentation by providing the context (hl.init(sc=spark.sparkContext), or the address of the master node (hl.init(master=spark.sparkContext.getConf().get("spark.master"))) as suggested on another thread, both resulting in the same error:


TypeError: 'JavaPackage' object is not callable

It suggests there is an incompatibility between the Spark and Hail versions, but I don’t see where this might be coming from, as I don’t have problems if I initialise Hail in a fresh session. I’ve tested it on Hail==0.2.116-cd64e0876c94 and 0.2.113-cf32652c5077 and PySpark 3.3.0 and 3.3.2.

Can you guys help me out?
Thanks!

danking · May 9, 2023, 1:44pm

Hi @ireneisdoomed !

Are the Hail JARs in the class path of your Spark cluster? There are some details on using arbitrary Spark clusters in the docs.

For example, if you’re using spark-submit, you need to specify the JAR paths:

HAIL_HOME=$(pip3 show hail | grep Location | awk -F' ' '{print $2 "/hail"}')
spark-submit \
  --jars $HAIL_HOME/hail-all-spark.jar \
  --conf spark.driver.extraClassPath=$HAIL_HOME/hail-all-spark.jar \
  --conf spark.executor.extraClassPath=./hail-all-spark.jar \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator \
  hail-script.py

ireneisdoomed · May 9, 2023, 2:08pm

Hi! They are. I think this case is simpler than working on a cluster. My testing environment is local, where Hail is properly installed and the JARs should be available.
My issue is that we have a testing fixture to initialise Spark and run a set of tests. This fixture lives during the whole session. I am not able to test my Hail code because Hail’s session clashes with Spark’s.
I’m testing how to solve this in a simple jupyter notebook.
Any ideas? I’m happy to contribute with documentation if this is solved

danking · May 9, 2023, 2:29pm

Hmm.

So, you’re on a single machine with Hail installed. You’re running something like:

def test_hail():
    assert hl.utils.range_table(10).count() == 10

But in your tests, Spark is already initialized? I presume Spark wasn’t initialized by Hail though, right? Can you share the code that initializes Spark? And can you share the contents of spark-defaults.conf? If you pip-installed pyspark, go to the Location: specified in pip3 show pyspark, then cat pyspark/conf/spark-defaults.conf.

Topic		Replies	Views
Install Hail using Spark Hail Query & hailctl	15	1381	April 13, 2018
Unable to initialize hail - pyspark - py4J error Hail Query & hailctl	3	2141	June 24, 2020
Pip-installed Hail requires additional configuration options in Spark referring to the path to the Hail Python module directory HAIL_DIR Hail Batch & General Cloud	1	654	June 3, 2022
Can't create HailContext in Hail 0.1 tutorial Help [0.1]	4	791	May 22, 2019
[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError Hail Query & hailctl	2	524	July 14, 2021

Initialise Hail with existing Spark

Related topics