Install Hail using Spark

Hi,
I am installing Hail on Spark.
I am using my local machine for testing.

  1. Hail uses Python 3.6
  2. When I use:
    spark-2.0.2-bin-hadoop2.7
    I cannot run pyspark because requires python 3.5
  3. When I use:
    spark-2.3.0-bin-hadoop2.7
    I can run pyspark and I can import hail using python 3.6
    But hc = HailContext() fails.

I would appreciate your help to install Hail the right way on Spark.
Regards,
Octavio

Hi Octavio,

Hail 0.2 requires Python 3.6 and Spark 2.2.0. Here’s a link to the getting started page.

Hail 0.1 requires Python 2.7 and Spark 2.0.2 as described in docs.

Best,
Jackie

Jackie,

I installed both versions.
But I got problems in both.
import hail works ok
Problem when I call the context hl.init()
I get the error:
TypeError: ‘JavaPackage’ object is not callable
I checked the jar path and it is correct.
I would appreciate your advice.
Regards,
Octavio

We recommend you use 0.2. Can you open a new terminal window and try following the directions again here for Hail 0.2 with Spark 2.2.0? If that doesn’t work, can you do the following and report back on what it says?

$ java -version
$ echo $SPARK_HOME
$ echo $HAIL_HOME
$ echo $PATH

to Hail
It did not work:

Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28) 

[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import hail as l

>>> l.init()

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "<decorator-gen-68>", line 2, in init

  File "/Users/juarezespinosoh/hail2/hail/python/hail/typecheck/check.py", line 490, in _typecheck

    return __orig_func__(*args_, **kwargs_)

  File "/Users/juarezespinosoh/hail2/hail/python/hail/context.py", line 160, in init

    default_reference, force_ir)

  File "<decorator-gen-66>", line 2, in __init__

  File "/Users/juarezespinosoh/hail2/hail/python/hail/typecheck/check.py", line 481, in _typecheck

    return __orig_func__(*args_, **kwargs_)

  File "/Users/juarezespinosoh/hail2/hail/python/hail/context.py", line 54, in __init__

    min_block_size, branching_factor, tmp_dir, force_ir)

TypeError: 'JavaPackage' object is not callable

>>> quit

Use quit() or Ctrl-D (i.e. EOF) to exit

============

java version "1.8.0_161"

Java(TM) SE Runtime Environment (build 1.8.0_161-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

=======

$ echo $SPARK_HOME

/Users/juarezespinosoh/spark/spark-2.2.0-bin-hadoop2.7

=======

echo $HAIL_HOME

/Users/juarezespinosoh/hail2/hail

===========

/usr/local/bin:/Volumes/Images 1/dcmtk-macOS-0d2826645/bin:/Users/juarezespinosoh/Downloads/ImageMagick-7.0.3/bin:/usr/local/cuda/bin:/Users/juarezespinosoh/spark/spark-2.2.0-bin-hadoop2.7/bin:/Users/juarezespinosoh/hail2/hail/bin:/Developer/NVIDIA/CUDA-8.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Developer/.:/Developer/.

When I try to compile the source code for spark:
./gradlew -Dspark.version=2.2.0 shadowJar
It does not finish:
c++ -fvisibility=hidden -dynamiclib -O3 -march=native -g -std=c++11 -Ilibsimdpp-2.0-rc2 -Wall -Werror -fPIC -ggdb ibs.o davies.o -o lib/darwin/libhail.dylib
:compileScala
/Users/juarezespinosoh/hail3/hail/src/main/scala/is/hail/expr/ir/Compile.scala:129: inferred existential type (String, is.hail.expr.types.TAggregable, scala.reflect.ClassTag[_$1]) forSome { type $1 }, which cannot be expressed by wildcards, should be enabled
by making the implicit value scala.language.existentials visible.
This can be achieved by adding the import clause ‘import scala.language.existentials’
or by setting the compiler option -language:existentials.
See the Scaladoc for value scala.language.existentials for a discussion
why the feature should be explicitly enabled.
val env = ((aggName, aggType, TypeToIRIntermediateClassTag(aggType)) +: args).zipWithIndex
^
/Users/juarezespinosoh/hail3/hail/src/main/scala/is/hail/io/LoadMatrix.scala:297: inferred existential type (Array[String], Array[
$1]) forSome { type $1 }, which cannot be expressed by wildcards, should be enabled
by making the implicit value scala.language.existentials visible.
val (rowFieldNames, colIDs) = splitHeader(header1, nAnnotations, nCols)
^
/Users/juarezespinosoh/hail3/hail/src/main/scala/is/hail/rvd/OrderedRVD.scala:230: inferred existential type org.apache.spark.broadcast.Broadcast[is.hail.utils.IntervalTree[
$1]] forSome { type $1 }, which cannot be expressed by wildcards, should be enabled
by making the implicit value scala.language.existentials visible.
val intervalsBc = rdd.sparkContext.broadcast(intervals)
^
/Users/juarezespinosoh/hail3/hail/src/main/scala/is/hail/expr/AST.scala:612: match may not be exhaustive.
It would fail on the following inputs: Apply(
, _, ), ApplyMethod(, _, _, ), ArrayConstructor(, ), BaseStructConstructor(), Const(, _, ), If(, _, _, ), Lambda(, _, ), Let(, _, ), ReferenceGenomeDependentFunction(, _, _, ), Select(, _, _)
val identifiers = args.tail.map {

Hi Octavio,
How are you starting Python? Are you using the “jhail” / “ihail” scripts in the distribution?

Hi,
I am doing:
pyspark
then I run:
import hail
Then I call init and crashes…

yeah, this is to be expected because Hail isn’t connected to Spark without special environment variables.

Please follow the instructions here in the section “Running Hail locally with a pre-compiled distribution”.

If you use the “ihail” or “jhail” scripts to start hail, the setup will be done for you.

Hi,
Thanks is working. I am with the tutorials. I was not able to get some data for ssl problem. Could you advise?
File “/Users/juarezespinosoh/hail2/hail/python/hail/utils/tutorial.py”, line 65, in get_1kg

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)>

It looks like you’re running on a mac, so try this answer:

For MAC
Click Applications => Python then double-click Install Certificates.command

Thanks. Working fine now…

Great!

Last question:
Trying to run:
spark-submit --jars build/libs/hail-all-spark.jar
–py-files build/distributions/hail-python.zip
hailscript.py

I am not sure where is build/distributions/ ?
How can I create it. Using spark-2.2.0 and 0.2 Hail.

ah, this stuff is written for the “compiling your own” strategy. Use the jar and Python directory you find in the distribution you downloaded.

Hi,
My system was not compiling the code because I did not have cmake.
Now I got the zip.
Thanks again,
Octavio