Using Hail on Spark 2.1.1 Azure HDInsight causes error

yugagarin · April 4, 2018, 8:07pm

Good day!

I am using Spark 2.1.1 on HDinsight in Azure. I have built Hail as listed here https://hail.is/docs/stable/getting_started.html for Spark version 2.1.1 . However, when I execute in ipython

from hail import *
hc = HailContext()

I get this error:

Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: java.lang.IllegalArgumentException: requirement failed: This Hail JAR was compiled for Spark 2.1.1,
but the version of Spark available at runtime is 2.1.1.2.6.2.25-1.
at scala.Predef$.require(Predef.scala:224)
at is.hail.HailContext$.configureAndCreateSparkContext(HailContext.scala:40)
at is.hail.HailContext$.apply(HailContext.scala:166)
at is.hail.HailContext.apply(HailContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Looks like py4j treats Spark version as taking real spark version 2.1.1 and appending Hadoop version which is 2.6.2.25-1 which creates version mismatch between framework version and Hail JAR version.

How to fix it and if there is a workaround?

Regards,
-Yuriy

tpoterba · April 4, 2018, 8:49pm

We’ve discussed relaxing the version check to just verify the major version (X.Y) instead of the full version. Here, you’ll need to compile against the full Spark version - 2.1.1.2.6.2.25-1

Note that 0.1 isn’t really maintained anymore, so you might want to consider switching to 0.2: https://www.hail.is/docs/devel

yugagarin · April 4, 2018, 9:28pm

Thanks for your answer. I did try doing a build against full version but it failed with

FAILURE: Build failed with an exception.

What went wrong:
Could not resolve all dependencies for configuration ‘:runtime’.

Could not find org.apache.spark:spark-core_2.11:2.1.1.2.6.2.25-1.
Searched in the following locations:

Required by:
:hail:unspecified

Could not find org.apache.spark:spark-sql_2.11:2.1.1.2.6.2.25-1.
Searched in the following locations:

Required by:
:hail:unspecified

Could not find org.apache.spark:spark-mllib_2.11:2.1.1.2.6.2.25-1.
Searched in the following locations:

Required by:
:hail:unspecified

Could not find org.apache.spark:spark-sql_2.11:2.1.1.2.6.2.25-1.
Searched in the following locations:

Required by:
:hail:unspecified > org.elasticsearch:elasticsearch-spark-20_2.11:5.5.1

Could not find org.apache.spark:spark-core_2.11:2.1.1.2.6.2.25-1.
Searched in the following locations:

Required by:
:hail:unspecified > org.elasticsearch:elasticsearch-spark-20_2.11:5.5.1 > org.apache.spark:spark-streaming_2.11:2.1.0

Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Note I removed urls of locations where it searched due to limit of 2 links for new users…

Regards,
-Yuriy

tpoterba · April 4, 2018, 9:29pm

huh, interesting. This is a good reason to relax the check a bit.

tpoterba · April 4, 2018, 9:30pm

https://github.com/hail-is/hail/issues/3296

tpoterba · April 4, 2018, 9:30pm

will try to have this done in the next few days

tpoterba · April 4, 2018, 9:30pm

although, hmm. This is a problem with 0.1. You should upgrade to 0.2, we’ll fix the problem there!

yugagarin · April 4, 2018, 9:35pm

I am not sure how easy I can change Spark version on the cluster as it comes preset by Microsoft VMs in the cloud. I will look into it, but most likely the answer will be no from ops people.

tpoterba · April 4, 2018, 9:37pm

looks like azure has a 2.2 Spark image as of a few weeks ago: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-release-notes#notes-for-03202018---release-of-spark-22-on-hdinsight-36

yugagarin · April 4, 2018, 9:41pm

Yes, but I read you guys (Hail) does not support it just yet. I recall SPARK_CLASSPATH was not working (reading your forums the rumor was that spark 2.2 dropped SPARK_CLASSPATH check) and thus could not find hail jar file. I started in fact with spark 2.2 and had to ask ops folks to downgrade to 2.1

tpoterba · April 4, 2018, 9:52pm

Hail 0.2 actually only supports Spark 2.2+

This is mostly due to the need to move to Python 3.6, and previous version of Spark were incompatible with Python 3.

yugagarin · April 4, 2018, 10:01pm

I guess I was reading old ‘getting started’ page which speaks of stable release of hail 0.1 on Spark 2.0.2 and 2.1.x

So, are you truly suggesting to start using hail 0.2 really on Spark 2.2 instead?

tpoterba · April 4, 2018, 10:52pm

Yes, I’d really recommend using the 0.2 beta version. It has occasional interface changes, but it has much better interfaces and is much more flexible than 0.1 was.

yugagarin · April 5, 2018, 6:50am

Thanks!

So I built from the source Hail 0.2 on Spark 2.2 based on https://hail.is/docs/devel/getting_started.html and was about to test a sample

import hail as hl
hl.init(sc)

My question for SparkContext sc to be passed in will ipython create one for me as it is typically the case with other tools (since it is backed by a cluster and I do it from a name node) or shall I create one myself programmatically beforehand and then pass it in?

As of now, if I follow instructions sc does not exist when trying the sample.

Regards,
-Yuriy

tpoterba · April 5, 2018, 1:54pm

Hail will create a Spark context with default parameters if you don’t pass one into hl.init(). If you do pass one in, you’ll need to be sure to set the config parameters mentioned in the getting started page.

We usually let Hail construct the Spark context.

yugagarin · April 5, 2018, 6:21pm

Apparently parts of hadoop codebase still uses python 2 so it fails run under python 3 (eg Anaconda) used by Hail eg /usr/bin/hdp-select script

Oh well. Maybe at some point, it will be seamless integration …

In [1]: import hail as hl;

In [2]: hl.init();

File “/usr/bin/hdp-select”, line 242
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access ‘/usr/hdp//hadoop/lib’: No such file or directory
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.2-13/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.2-13/spark_llap/spark-llap-assembly-1.0.0.2.6.3.2-13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

TypeError Traceback (most recent call last)
in ()
----> 1 hl.init();

in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference)

/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in typecheck(orig_func, *args, **kwargs)
488 def typecheck(orig_func, *args, **kwargs):
489 args, kwargs = check_all(orig_func, args, kwargs, checkers, is_method=False)
–> 490 return orig_func(*args_, **kwargs_)
491
492 return decorator(_typecheck)

/usr/local/hail/build/distributions/hail-python.zip/hail/context.py in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference)
154 “”"
155 HailContext(sc, app_name, master, local, log, quiet, append,
–> 156 min_block_size, branching_factor, tmp_dir, default_reference)
157
158 def stop():

in init(self, sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference)

/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in typecheck(orig_func, *args, **kwargs)
479 def typecheck(orig_func, *args, **kwargs):
480 args, kwargs = check_all(orig_func, args, kwargs, checkers, is_method=True)
–> 481 return orig_func(*args_, **kwargs_)
482
483 return decorator(_typecheck)

/usr/local/hail/build/distributions/hail-python.zip/hail/context.py in init(self, sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference)
51 self._jhc = self._hail.HailContext.apply(
52 jsc, app_name, joption(master), local, log, True, append,
—> 53 min_block_size, branching_factor, tmp_dir)
54
55 self._jsc = self._jhc.sc()

TypeError: ‘JavaPackage’ object is not callable

tpoterba · April 6, 2018, 1:42pm

Ah… that’s really annoying. I’m not sure what to suggest

Topic		Replies	Views
Spark version problem Hail Query & hailctl	3	880	April 18, 2019
Install Hail using Spark Hail Query & hailctl	15	1386	April 13, 2018
Firecloud notebooks not supporting Hail 0.2 Hail Query & hailctl	7	541	January 8, 2019
[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError Hail Query & hailctl	2	527	July 14, 2021
EnvironmentError: no Hail context initialized, create one first Help [0.1]	7	1020	May 18, 2017

Using Hail on Spark 2.1.1 Azure HDInsight causes error

Related topics