Firecloud notebooks not supporting Hail 0.2


#1

Dear Hail team,
I’m trying to launch Hail from the beta notebooks installed in Firecloud, but there seems to be a compatibility issue between the spark versions that are installed? Here is the error message (I should note that I ran sys.path.append('/usr/lib/spark/python/lib/py4j-0.10.4-src.zip') prior to that to prevent a py4j error):

Py4JJavaErrorTraceback (most recent call last)

in ()
1 from hail import *
----> 2 hc = HailContext()

in init(self, sc, app_name, master, local, log, quiet, append, parquet_compression, min_block_size, branching_factor, tmp_dir)

/etc/hail/hail-0.1-5c275cc216e1.zip/hail/typecheck/check.pyc in _typecheck(f, *args, **kwargs)
243 def _typecheck(f, *args, **kwargs):
244 check_all(f, args, kwargs, checkers, is_method=True)
–> 245 return f(*args, **kwargs)
246
247 return decorator(_typecheck)

/etc/hail/hail-0.1-5c275cc216e1.zip/hail/context.pyc in init(self, sc, app_name, master, local, log, quiet, append, parquet_compression, min_block_size, branching_factor, tmp_dir)
86 self._jhc = self._hail.HailContext.apply(
87 jsc, app_name, joption(master), local, log, True, append,
—> 88 parquet_compression, min_block_size, branching_factor, tmp_dir)
89
90 self._jsc = self._jhc.sc()

/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in call(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1134
1135 for temp_arg in temp_args:

/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 “An error occurred while calling {0}{1}{2}.\n”.
–> 319 format(target_id, “.”, name), value)
320 else:
321 raise Py4JError(

Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: java.lang.IllegalArgumentException: requirement failed: This Hail JAR was compiled for Spark 2.0.2,
but the version of Spark available at runtime is 2.2.1.
at scala.Predef$.require(Predef.scala:224)
at is.hail.HailContext$.configureAndCreateSparkContext(HailContext.scala:40)
at is.hail.HailContext$.apply(HailContext.scala:166)
at is.hail.HailContext.apply(HailContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)


#2

Did you install Hail yourself on these notebooks or was it preinstalled on Firecloud? It looks like the core problem here is that you’re trying to use Hail 0.1 (deprecated and old!) with a Spark 2.2 cluster, which 0.1 doesn’t support.


#3

Dear Tim,
thank you for the quick reply - I just started the Firecloud notebook “out of the box” as implemented in Firecloud.

There are no options for me to chose a specific Spark version and I already managed to overcome the py4j error (wrong version) through sys.append. For the Spark issue, however, I would need your help as I don’t know how to change this within the notebook?


#4

Ah, I see. In that case, I think that Firecloud hasn’t upgraded.

There’s a new beta UI called Terra, and it looks like that may support 0.2 (see the November release notes): https://software.broadinstitute.org/firecloud/blog


#5

Thank you!
I don’t think that Terra is available yet? Could you point me to the people managing Firecloud or could you think of a quick fix for this?


#6

looks like this is the best place to ask:

If you’re running on Google, take a look also at https://github.com/Nealelab/cloudtools - this is how most Hail users at the Broad are running Hail on Google.


#7

thank you!


#8

Just to add that a Jupyter notebook running Hail on the Google Cloud is actually quite easy to set up with cloudtools after downloading the google-cloud-sdk

./google-cloud-sdk/install.sh
./google-cloud-sdk/bin/gcloud init
pip install cloudtools --upgrade
cluster start datascience -p 6
cluster connect datascience notebook