Hi,
On our cluster I have a spark server running on kubernetes to which I usually connect in pyspark by setting up a Spark.Context with master pointing to the corresponding address (like spark://master_address:7077).
It is not clear to me how to install hail in our spark server and then connect to it from my python scripts and notebooks.
From the documentation, I think we first need to compile hail on spark server using install-on-cluster. Do we need to do any additional configuration after running make install-on-cluster?
Any recommendation on how to do this when the spark server is managed by kubernetes?
Once hail is installed in the spark server (let’s say the spark master address is 10.10.10.10:7077), how can I proceed to connect hail to this server?
Is it enough to create a pyspark Spark Context like this
conf.setAll(
[(‘spark.driver.memory’, ‘16g’),
(‘spark.driver.cores’, ‘4’),
(‘spark.executor.memory’, ‘8g’),
(‘spark.executor.instances’,‘10’),
(‘spark.executor.cores’, ‘4’),
(‘spark.driver.extraClassPath’, ‘$HAIL_HOME/hail-all-spark.jar’),
(‘spark.executor.extraClassPath’, ‘$HAIL_HOME/hail-all-spark.jar’),
(‘spark.serializer’, ‘org.apache.spark.serializer.KryoSerializer’),
(‘spark.kryo.registrator’, 'is.hail.kryo.HailKryoRegistrator ')]
)
sc=SparkContext(master=“spark://10.10.10.10:7077”, appName=“hail”, conf=conf)
and then pass this to hail.init(sc)? Or I have to do something else?
Thanks for support!