Config setup - Hail Installation

Hello Everyone,

I am trying to install Hail in ubuntu. But I guess the documentation is updated.

Now in the docs I don’t see the below config statements which were present earlier.

conf = pyspark.SparkConf().setAll([
    ('spark.jars', str(hail_jars)),
    ('spark.driver.extraClassPath', str(hail_jars)),
    ('spark.executor.extraClassPath', './hail-all-spark.jar'),
    ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),
    ('spark.kryo.registrator', 'is.hail.kryo.HailKryoRegistrator'),
    ('spark.driver.memory', '180g'),
    ('spark.executor.memory', '180g'),
    ('spark.local.dir', '/t1,/data/abcd/spark')
])
sc = pyspark.SparkContext('local[*]', 'Hail', conf=conf)
hl.init(sc)  

Does that mean hail installation is simplified and we don’t really have to do all these config steps?

Under Linux and spark cluster section in recently updated docs, I don’t see any such config statements, as shown above, so we don’t have to do all this?

If we still have any of the above config statements in the recently updated hail doc, can you please direct me to the doc where I can find the above statements?

Yes, that’s exactly what this means! You may still want to update memory settings / temp dirs specific to your cluster, but I think it’ll be easier to do that as arguments to the pyspark executable.