Hail 0.2.34 initialisation error (works fine in 0.2.41)

This is similar to TypeError: 'JavaPackage' object is not callable, but the steps outlined to resolve this haven’t worked for me.

I have a Spark cluster with Hail 0.2.41 installed on each node using pip. Spark is 2.4.5 and Hadoop is 3.2.1. This works great. However, if I build my cluster using Hail 0.2.34 instead – where otherwise everything else is exactly the same – I get the dreaded 'JavaPackage' object is not callable error:

/home/ubuntu/venv/lib/python3.6/site-packages/hail/context.py:71: UserWarning: pip-installed Hail requires additional configuration options in Spark referring
  to the path to the Hail Python module directory HAIL_DIR,
  e.g. /path/to/python/site-packages/hail:
    spark.jars=HAIL_DIR/hail-all-spark.jar
    spark.driver.extraClassPath=HAIL_DIR/hail-all-spark.jar
    spark.executor.extraClassPath=./hail-all-spark.jar
  'pip-installed Hail requires additional configuration options in Spark referring\n'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-cc0ec34ad727> in <module>
      4 
      5 sc = pyspark.SparkContext()
----> 6 hail.init(sc=sc, tmp_dir="/home/ubuntu/data")

<decorator-gen-1275> in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, _optimizer_iterations, _backend)

~/venv/lib/python3.6/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    583     def wrapper(__original_func, *args, **kwargs):
    584         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 585         return __original_func(*args_, **kwargs_)
    586 
    587     return wrapper

~/venv/lib/python3.6/site-packages/hail/context.py in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, _optimizer_iterations, _backend)
    288                 min_block_size, branching_factor, tmp_dir,
    289                 default_reference, idempotent, global_seed, spark_conf,
--> 290                 _optimizer_iterations,_backend)
    291 
    292 

<decorator-gen-1273> in __init__(self, sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, optimizer_iterations, _backend)

~/venv/lib/python3.6/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    583     def wrapper(__original_func, *args, **kwargs):
    584         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 585         return __original_func(*args_, **kwargs_)
    586 
    587     return wrapper

~/venv/lib/python3.6/site-packages/hail/context.py in __init__(self, sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, optimizer_iterations, _backend)
    119             self._jhc = self._hail.HailContext.apply(
    120                 jsc, app_name, joption(master), local, log, True, append,
--> 121                 min_block_size, branching_factor, tmp_dir, optimizer_iterations)
    122 
    123         self._jsc = self._jhc.sc()

TypeError: 'JavaPackage' object is not callable

My Spark config is set correctly, AFAIK, matching the warning message above:

spark.master                     spark://192.168.252.196:7077

spark.driver.memory              37414m
spark.executor.memory            37414m
spark.executor.instances         1

spark.driver.extraClassPath      /home/ubuntu/venv/lib/python3.6/site-packages/hail/backend/hail-all-spark.jar
spark.executor.extraClassPath    ./hail-all-spark.jar
spark.jars                       /home/ubuntu/venv/lib/python3.6/site-packages/hail/backend/hail-all-spark.jar,/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar,/opt/hadoop/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar

spark.history.fs.logDirectory    hdfs:///shared/spark-logs
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs:///shared/spark-logs

spark.ui.reverseProxy            true
spark.ui.reverseProxyUrl         http://192.168.252.196/spark

spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator           is.hail.kryo.HailKryoRegistrator
spark.speculation                True

Any ideas why this is not working for me with 0.2.34, but it works fine in 0.2.41?

We have changed details of the way the Python and Java processes talk to each other over time, and can’t really support either of these old versions. Can you update to the latest?

Thanks; that’s fair enough. Our researcher wants to stick with 0.2.34 for the sake of consistency. We have working 0.2.41 cluster software, so that may be the best bet for him for now…but it sounds like it’s not good for us in terms of support. That’s totally reasonable; we’ll see what we can do.

please do let the researcher know that we’ve gone to a lot of effort to ensure that the radical changes on the backend don’t break front-end interfaces – their pipeline on 0.2.34 should work on 0.2.60 just fine (but more quickly!)

1 Like