Hail 0.2 on glue

I am facing below error when trying to import hail on glue (spark3, python2, glue version 1.0). Any help is much appreciated:

21/06/29 21:47:11 ERROR ApplicationMaster: User application exited with status 1

import sys, os
from pyspark import SparkContext, SparkConf
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.job import Job
from hail import *
#import hail as hl

conf = SparkConf()
conf.set(‘spark.app.name’, u’Running Hail on Glue’)
conf.set(‘spark.sql.files.maxPartitionBytes’, ‘1099511627776’)
conf.set(‘spark.sql.files.openCostInBytes’, ‘1099511627776’)
conf.set(‘spark.kryo.registrator’, ‘is.hail.kryo.HailKryoRegistrator’)
conf.set(‘spark.serializer’, ‘org.apache.spark.serializer.KryoSerializer’)

sc = SparkContext(conf=conf)
sc._jsc.hadoopConfiguration().set(“mapred.output.committer.class”, “org.apache.hadoop.mapred.FileOutputCommitter”)
sc.getConf().getAll()

glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
#job.init(args[‘JOB_NAME’], args)
hc = HailContext(sc)
#hl.init(sc)

print(“Hello World!!!”)

when I change the import statements as shown below I see this error:

Py4JError: An error occurred while calling z:is.hail.HailContext.apply. Trace:

#from hail import *
import hail as hl

#hc = HailContext(sc)
hl.init(sc)

also want to add that when using python 3 I see this error:

ModuleNotFoundError: No module named ‘SocketServer’

What version of Hail are you using? There’s no mention of “SocketServer” anywhere in the code base right now.

I am using hail 2.0

What’s the full version? pip show hail?

I am trying to rebuild (steps below) on an ec2 with amzn linux and seeing this error :

FAILURE: Build failed with an exception.

  • Where:
    Build file ‘/home/ec2-user/hail/hail/build.gradle’ line: 194

  • What went wrong:
    A problem occurred evaluating root project ‘hail’.

assert(scalaMajorVersion == “2.11”)
| |
‘2.12’ false

sudo yum install -y g++ cmake git
sudo yum install -y lz4
sudo yum install -y lz4-devel
git clone GitHub - hail-is/hail: Scalable genomic data analysis.

cd hail/hail && git fetch && git checkout

sudo yum groupinstall ‘Development Tools’
sudo yum install java-1.8.0
sudo alternatives --config java
sudo yum search java | grep openjdk
sudo yum install java-1.8.0-openjdk-headless.x86_64
sudo yum install java-1.8.0-openjdk-devel.x86_64
sudo update-alternatives --config java
sudo update-alternatives --config javac

make install HAIL_COMPILE_NATIVES=1 SPARK_VERSION=2.4.4

any help please ? I am basically stuck big time getting hail 0.2 working on glue

Try this:

make install HAIL_COMPILE_NATIVES=1 SPARK_VERSION=2.4.4 SCALA_VERSION=2.11.12

Thanks, I was able to build on ec2 (amazo linux). But when I try to use these files on aws glue I see this error:

import hail as hl
ModuleNotFoundError: No module named ‘hail’

[ec2-user@ip-172-31-50-72 ~]$ pip show hail

Name: hail

Version: 0.2.74

Summary: Scalable library for exploring and analyzing genomic data.

Home-page: https://hail.is

Author: Hail Team

Author-email: hail@broadinstitute.org

License: UNKNOWN

Location: /home/ec2-user/.local/lib/python3.7/site-packages

Requires: aiohttp, humanize, janus, pandas, asyncinit, decorator, aiohttp-session, pyspark, google-cloud-storage, tabulate, python-json-logger, scipy, bokeh, tqdm, dill, botocore, gcsfs, boto3, PyJWT, nest-asyncio, Deprecated, fsspec, requests, numpy, parsimonious, hurry.filesize

I got past the previous error but currently stuck at the one below. Please help if you can:

File “/tmp/test_08_02”, line 8, in
import hail as hl
File “/tmp/hail-python.zip/hail/init.py”, line 44, in
from .table import Table, GroupedTable, asc, desc # noqa: E402
File “/tmp/hail-python.zip/hail/table.py”, line 7, in
from hail.expr.expressions import Expression, StructExpression,
File “/tmp/hail-python.zip/hail/expr/init.py”, line 1, in
from .types import dtype, HailType, hail_type, is_container, is_compound,
File “/tmp/hail-python.zip/hail/expr/types.py”, line 10, in
from hail import genetics
File “/tmp/hail-python.zip/hail/genetics/init.py”, line 1, in
from .call import Call
File “/tmp/hail-python.zip/hail/genetics/call.py”, line 2, in
from hail.utils import FatalError
File “/tmp/hail-python.zip/hail/utils/init.py”, line 8, in
from .tutorial import get_1kg, get_hgdp, get_movie_lens
File “/tmp/hail-python.zip/hail/utils/tutorial.py”, line 7, in
from hailtop.utils import sync_retry_transient_errors
File “/tmp/hail-python.zip/hailtop/utils/init.py”, line 2, in
from .utils import (
File “/tmp/hail-python.zip/hailtop/utils/utils.py”, line 19, in
import google.auth.exceptions
ModuleNotFoundError: No module named ‘google’

pip install --upgrade --target=/home/ec2-user/hail/hail/python/ google

import hail as hl
File “/tmp/hail-python.zip/hail/init.py”, line 44, in
from .table import Table, GroupedTable, asc, desc # noqa: E402
File “/tmp/hail-python.zip/hail/table.py”, line 7, in
from hail.expr.expressions import Expression, StructExpression,
File “/tmp/hail-python.zip/hail/expr/init.py”, line 1, in
from .types import dtype, HailType, hail_type, is_container, is_compound,
File “/tmp/hail-python.zip/hail/expr/types.py”, line 10, in
from hail import genetics
File “/tmp/hail-python.zip/hail/genetics/init.py”, line 1, in
from .call import Call
File “/tmp/hail-python.zip/hail/genetics/call.py”, line 2, in
from hail.utils import FatalError
File “/tmp/hail-python.zip/hail/utils/init.py”, line 8, in
from .tutorial import get_1kg, get_hgdp, get_movie_lens
File “/tmp/hail-python.zip/hail/utils/tutorial.py”, line 7, in
from hailtop.utils import sync_retry_transient_errors
File “/tmp/hail-python.zip/hailtop/utils/init.py”, line 2, in
from .utils import (
File “/tmp/hail-python.zip/hailtop/utils/utils.py”, line 19, in
import google.auth.exceptions
ModuleNotFoundError: No module named ‘google’

pip install --upgrade google-auth google-auth-httplib2 google-api-python-client

import hail as hl
File “/tmp/hail-python.zip/hail/init.py”, line 44, in
from .table import Table, GroupedTable, asc, desc # noqa: E402
File “/tmp/hail-python.zip/hail/table.py”, line 7, in
from hail.expr.expressions import Expression, StructExpression,
File “/tmp/hail-python.zip/hail/expr/init.py”, line 1, in
from .types import dtype, HailType, hail_type, is_container, is_compound,
File “/tmp/hail-python.zip/hail/expr/types.py”, line 10, in
from hail import genetics
File “/tmp/hail-python.zip/hail/genetics/init.py”, line 1, in
from .call import Call
File “/tmp/hail-python.zip/hail/genetics/call.py”, line 2, in
from hail.utils import FatalError
File “/tmp/hail-python.zip/hail/utils/init.py”, line 8, in
from .tutorial import get_1kg, get_hgdp, get_movie_lens
File “/tmp/hail-python.zip/hail/utils/tutorial.py”, line 7, in
from hailtop.utils import sync_retry_transient_errors
File “/tmp/hail-python.zip/hailtop/utils/init.py”, line 2, in
from .utils import (
File “/tmp/hail-python.zip/hailtop/utils/utils.py”, line 19, in
import google.auth.exceptions
ModuleNotFoundError: No module named 'google