Hello Everyone,
I am trying to initialize the hail. Please find the codes that I executed in jupyter notebook. can help me with this error? I am trying to debug as well
import findspark
findspark.init()
import pyspark
import hail as hl
import os
from pathlib import Path
%env SPARK_HOME /opt/spark
%env HAIL_HOME /opt/hail/hail
hail_home = Path(os.getenv('HAIL_HOME'))
hail_jars = hail_home/'build'/'libs'/'hail-all-spark.jar'
conf = pyspark.SparkConf().setAll([
('spark.jars', str(hail_jars)),
('spark.driver.extraClassPath', str(hail_jars)),
('spark.executor.extraClassPath', './hail-all-spark.jar'),
('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),
('spark.kryo.registrator', 'is.hail.kryo.HailKryoRegistrator'),
('spark.driver.memory', '80g'),
('spark.executor.memory', '80g'),
('spark.local.dir', '/tmp,/data/volume03/spark')
])
sc = pyspark.SparkContext('local[*]', 'Hail', conf=conf)
hl.init(sc) #error in this line
The error message is shown below
Py4JError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply. Trace:
py4j.Py4JException: Method apply([class org.apache.spark.SparkContext, class java.lang.String, null, class java.lang.String, class java.lang.Boolean, class java.lang.Integer, class java.lang.String, class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:276)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
By following the suggestion from this post, if I type the below
sc = pyspark.SparkContext()
I get another error message which is given below
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=Hail, master=local[*]) created by init at :1
So I referred this SO post but it couldn’t take multiple arguments as shown below and resulted in error
sc = pyspark.SparkContext('local[*]', 'Hail', conf=conf)
hl.init(sc)
TypeError: getOrCreate() got multiple values for argument ‘conf’
Can help me fix this error?