Hi,
I apologize if this is a dumb question but I’m a Hail newbie. I am trying to simply run Hail on Amazon EMR. I have setup a small EMR cluster using emr-5.13.0 which has Spark 2.3.0. Following some scripts built by Amazon I logged into the master node and ran the following
HAIL_VERSION=“0.1”
SPARK_VERSION=“2.3.0”
/usr/bin/sudo pip install decorator
sudo yum update -y
sudo yum install g++ cmake git -y
git clone GitHub - hail-is/hail: Cloud-native genomic dataframes and batch computing
cd hail/
git checkout $HAIL_VERSION
./gradlew -Dspark.version=$SPARK_VERSION shadowJar archiveZip
cp $PWD/build/distributions/hail-python.zip $HOME
cp $PWD/build/libs/hail-all-spark.jar $HOME
echo “” >> $HOME/.bashrc
echo “export PYTHONPATH=${PYTHONPATH}:$HOME/hail-python.zip” >> $HOME/.bashrc
afterwards I tried to start pyspark using the following command:
pyspark --jars hail-all-spark.jar --py-files hail-python.zip
as I try to import Hail I get the following error
sc.addFile(‘/home/hadoop/hail-all-spark.jar’)
sc.addPyFile(‘/home/hadoop/hail-python.zip’)
from hail import *
Traceback (most recent call last):
File “”, line 1, in
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/init.py”, line 1, in
#
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/expr.py”, line 3, in
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/representation/init.py”, line 1, in
#
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/representation/variant.py”, line 2, in
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/typecheck/init.py”, line 1, in
#
File “/mnt/tmp/spark-7970940d-c351-48d6-be82-1c2cb4647b24/userFiles-7fa2e70a-b1b5-48a3-bcf8-fba119200a8a/hail-python.zip/hail/typecheck/check.py”, line 1, in
ImportError: cannot import name getargspec
hc = HailContext(sc)
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘HailContext’ is not defined
Could someone explain why this is happening?