Hail installation in Cloudera

We are trying to install Hail in our Cloudera/Hadoop system. We plan to install Hail in one of gateway node in our system. We met several issues during our installation as follows:

  1. When we tried to get source code from git, we got error message as follows:

[local]# git checkout 0.1
fatal: Not a git repository (or any parent up to mount point /usr)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[local]# cd hail

  1. We tried to compile the code based on the instruction for Cloudera on your web, we met several issues during the compiling Hail. We find some different options for the compilation based on “unofficial” information, though we got compilation passed. We made some different version for different compile options. However, we got similar issues when we conduct the testing as the instruction in Hail website. Here is one of the compile options we made, the the error messages we got as follows:

A. Compile:
[hail]# ./gradlew -Dspark.version=2.1.0.cloudera1 clean shadowJar archiveZip

B. Test & Error message

a. Environment setting:

export JAVA_HOME=/usr/java/jdk1.8.0_144
export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2

spark_core_jars=( “${SPARK_HOME}/jars/spark-core*.jar” )
export HAIL_HOME=/usr/local/hail
export PYTHONPATH=“$PYTHONPATH:$HAIL_HOME/build/distributions/hail-python.zip:$S
PARK_HOME/python:$SPARK_HOME/python/lib/py4j-*-src.zip”

#export SPARK_CLASSPATH=$HAIL_HOME/build/libs/hail-all-spark.jar

b. Pyspark :

pyspark2 --jars build/libs/hail-all-spark.jar
–py-files build/distributions/hail-python.zip
–conf spark.sql.files.openCostInBytes=1099511627776
–conf spark.sql.files.maxPartitionBytes=1099511627776
–conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator
–conf spark.hadoop.parquet.block.size=1099511627776
–conf spark.driver.extraClassPath=hail/build/libs/hail-al
l-spark.jar
–conf “spark.executor.extraClassPath=./hail-all-spark.jar”
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .
_/_,// //_\ version 2.1.0.cloudera1
/
/

Using Python version 2.7.12 (default, Jul 2 2016 17:42:40)
SparkSession available as ‘spark’.

from hail import *
hc = HailContext(sc)
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/history.py”, line 29, in record_init
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 226, in _typecheck
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 84, in init
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/pyspark/sql/utils.py”, line 63, in deco
return f(*a, **kw)
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling z:is.hail.HailContext.apply. Trace:
py4j.Py4JException: Method apply([class org.apache.spark.SparkContext, class java.lang.String, class scala.None$, class java.lang.String, class java.lang.String, class java.lang.Boolean, class java.lang.Boolean, class java.lang.Integer, class java.lang.Integer, class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

hc = HailContext()
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/history.py”, line 29, in record_init
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 226, in _typecheck
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 84, in init
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/pyspark/sql/utils.py”, line 63, in deco
return f(*a, **kw)
File “/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling z:is.hail.HailContext.apply. Trace:
py4j.Py4JException: Method apply([null, class java.lang.String, class scala.None$, class java.lang.String, class java.lang.String, class java.lang.Boolean, class java.lang.Boolean, class java.lang.Integer, class java.lang.Integer, class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

For the first issue, we do not the impact for the source code checkout (git chechout 0.1). For the issue 2, we might miss some information (i.e. right options for Cloudera). Would you like to tell us the right way that we can get Hail installed? We sincerely appreciate your help. Please feel free to contact if you have any question.

Rest regards,

–Fang

Moving here from email.

The next error was:


Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0.cloudera1
      /_/
 
Using Python version 2.7.12 (default, Jul  2 2016 17:42:40)
SparkSession available as 'spark'.
>>> from hail import *
>>> hc = HailContext(sc)
Running on Apache Spark version 2.1.0.cloudera1
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version devel-42376b8
>>> vds = hc.read('.../vds')
>>> print(vds.summarize().report())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-431>", line 2, in summarize
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/java.py", line 127, in handle_py4j
hail.java.FatalError: An error occurred while calling into JVM, probably due to invalid parameter types.
 
Java stack trace:
An error occurred while calling o70.isGenericGenotype. Trace:
py4j.Py4JException: Method isGenericGenotype([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)