Error on read vcd file:hc.import_vcf

trying to read a vcf file produces the following error:


NameError Traceback (most recent call last)
in ()

1
----> 2 vds = hc.import_vcf(‘x.clean.tidy.vcf’)
3 vds.count()
4 hc._jhc.report()

NameError: name ‘hc’ is not defined

Are you running a terminal ipython or a GUI Jupyter notebook? I believe from our last thread that you’re using a Jupyter notebook.

Did you open one of the Hail tutorial notebooks? If so, did you run the lines that import the hail package and start the Hail context? These lines look like:

from hail import *
hc = HailContext

Jupyter notebooks must be run sequentially, you can’t start in the middle because later cells often depend on variables defined in earlier cells.

no, I’m doing this in mac OSX terminal.
I typed ihail followed by
from hail import *
hc = HailContext

ImportError Traceback (most recent call last)
in ()
----> 1 from hail import *
2 hc = HailContext

/Users/AleRodriguez/hail/python/hail/init.py in ()
----> 1 import hail.expr
2 from hail.representation import *
3 from hail.context import HailContext
4 from hail.dataset import VariantDataset
5 from hail.expr import *

/Users/AleRodriguez/hail/python/hail/expr.py in ()
1 import abc
----> 2 from hail.java import scala_object, Env, jset
3 from hail.representation import Variant, AltAllele, Genotype, Locus, Interval, Struct, Call
4
5

/Users/AleRodriguez/hail/python/hail/java.py in ()
----> 1 import SocketServer
2 import socket
3 import sys
4 from threading import Thread
5

ImportError: No module named 'SocketServer’
Prior to running ihail I default my python version Python 2.7.10
1.virtualenv -p /usr/bin/python2.7 --distribute temp-python
2.source temp-python/bin/activate
python -V
Python 2.7.10

ihail starts an ipython session with hail, not a python session with hail. If you start ipython you will see that you are using the Python 3 version of ipython.

I suspect that your temp-python virtual environment does not have ipython installed. For example, I started a 2.7 virtual environment and used which (see man which) to check with versions of python I’m using:

# virtualenv -p /usr/bin/python2.7 venv2.7
# source venv2.7/bin/activate
(venv2.7) # which ipython
/Users/dking/anaconda2/bin/ipython
(venv2.7) # which python
/Users/dking/projects/hail/venv2.7/bin/python

As you can see I am using the python version from the virtual environment but the ipython version from my anaconda python install. You likely received a warning message about this when you started ihail, something like:

/Users/dking/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:724:
UserWarning: Attempting to work in a virtualenv. If you encounter problems, please
install IPython inside the virtualenv.

To fix this situation you simply need to install ipython inside your virtualenv:

# source venv2.7/bin/activate
# pip install ipython
# rehash
# which ipython
/Users/dking/projects/hail/venv2.7/bin/ipython
# ihail

Also note that I issued the rehash command after installing ipython. This tells your terminal that you’ve installed a new program with the same name as an old program.

(venv2.7)-MacBook-Pro:~$ which ipython
/Users/AleRodriguez/venv2.7/bin/ipython

(venv2.7)-MacBook-Pro:~$ which python
/Users/AleRodriguez/venv2.7/bin/python

ipython -V
5.5.0

python -V
Python 2.7.10

(venv2.7)-MacBook-Pro:~$ ihail
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
Type “copyright”, “credits” or “license” for more information.

IPython 5.5.0 – An enhanced Interactive Python.
? -> Introduction and overview of IPython’s features.
%quickref -> Quick reference.
help -> Python’s own help system.
object? -> Details about ‘object’, use ‘object??’ for extra details.

In [1]: from hail import *
…: hc = HailContext
…:

ImportError Traceback (most recent call last)
in ()
----> 1 from hail import *
2 hc = HailContext

/Users/AleRodriguez/hail/python/hail/init.py in ()
----> 1 import hail.expr
2 from hail.representation import *
3 from hail.context import HailContext
4 from hail.dataset import VariantDataset
5 from hail.expr import *

/Users/AleRodriguez/hail/python/hail/expr.py in ()
1 import abc
----> 2 from hail.java import scala_object, Env, jset
3 from hail.representation import Variant, AltAllele, Genotype, Locus, Interval, Struct, Call
4
5

/Users/AleRodriguez/hail/python/hail/java.py in ()
4 from threading import Thread
5
----> 6 import py4j
7 from decorator import decorator
8

ImportError: No module named py4j

It this point I would be willing to remove python3, I’m using anaconda3 distribution and python3 is the default. I need to make this program work today, please help!

This is the same error as the last issue in this thread: Hail-overview:error in running the first two lines in the tutorial

I think somehow your configuration got messed up between then and now. Can you still run the tutorial?

Could you also paste the output of

echo $SPARK_HOME
source /Users/AleRodriguez/hail/bin/setup_env
echo $PYTHONPATH

That will help us identify what is wrong.

In short, I installed
pip install py4j
pip install pyspark
and finally got passed importing the Context:

1 from hail import *
2 hc = HailContext

details below:

ImportError Traceback (most recent call last)
in ()
----> 1 from hail import *
2 hc = HailContext

/Users/AleRodriguez/hail/python/hail/init.py in ()
----> 1 import hail.expr
2 from hail.representation import *
3 from hail.context import HailContext
4 from hail.dataset import VariantDataset
5 from hail.expr import *

/Users/AleRodriguez/hail/python/hail/expr.py in ()
1 import abc
----> 2 from hail.java import scala_object, Env, jset
3 from hail.representation import Variant, AltAllele, Genotype, Locus, Interval, Struct, Call
4
5

/Users/AleRodriguez/hail/python/hail/java.py in ()
4 from threading import Thread
5
----> 6 import py4j
7 from decorator import decorator
8

ImportError: No module named py4j

In [2]: exit()
AleRodriguez@Alejandras-MacBook-Pro:~$ pip install py4j
Collecting py4j
Downloading py4j-0.10.6-py2.py3-none-any.whl (189kB)
100% |████████████████████████████████| 194kB 2.8MB/s
Installing collected packages: py4j
Successfully installed py4j-0.10.6
AleRodriguez@Alejandras-MacBook-Pro:~$ ihail
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 12:01:12)
Type “copyright”, “credits” or “license” for more information.

IPython 5.5.0 – An enhanced Interactive Python.
? -> Introduction and overview of IPython’s features.
%quickref -> Quick reference.
help -> Python’s own help system.
object? -> Details about ‘object’, use ‘object??’ for extra details.

In [1]: from hail import *
…: hc = HailContext
…:

ImportError Traceback (most recent call last)
in ()
----> 1 from hail import *
2 hc = HailContext

/Users/AleRodriguez/hail/python/hail/init.pyc in ()
1 import hail.expr
2 from hail.representation import *
----> 3 from hail.context import HailContext
4 from hail.dataset import VariantDataset
5 from hail.expr import *

/Users/AleRodriguez/hail/python/hail/context.py in ()
2
3 from hail.typecheck import *
----> 4 from pyspark import SparkContext
5 from pyspark.sql import SQLContext
6

ImportError: No module named pyspark

In [2]: exit()
AleRodriguez@Alejandras-MacBook-Pro:~$ pip install pyspark
Collecting pyspark
Collecting py4j==0.10.4 (from pyspark)
Using cached py4j-0.10.4-py2.py3-none-any.whl
Installing collected packages: py4j, pyspark
Found existing installation: py4j 0.10.6
Uninstalling py4j-0.10.6:
Successfully uninstalled py4j-0.10.6
Successfully installed py4j-0.10.4 pyspark-2.2.0
AleRodriguez@Alejandras-MacBook-Pro:~$ ihail
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 12:01:12)
Type “copyright”, “credits” or “license” for more information.

IPython 5.5.0 – An enhanced Interactive Python.
? -> Introduction and overview of IPython’s features.
%quickref -> Quick reference.
help -> Python’s own help system.
object? -> Details about ‘object’, use ‘object??’ for extra details.

In [1]: from hail import *
…: hc = HailContext
…:

In [2]:

Okay this is a nightmare, after exiting hail I went back in and after trying
1 from hail import *
----> 2 hc = HailContext()
got this error:

OSError Traceback (most recent call last)
in ()
1 from hail import *
----> 2 hc = HailContext()

in init(self, sc, app_name, master, local, log, quiet, append, parquet_compression, min_block_size, branching_factor, tmp_dir)

/Users/AleRodriguez/hail/python/hail/typecheck/check.pyc in _typecheck(f, *args, **kwargs)
243 def _typecheck(f, *args, **kwargs):
244 check_all(f, args, kwargs, checkers, is_method=True)
–> 245 return f(*args, **kwargs)
246
247 return decorator(_typecheck)

/Users/AleRodriguez/hail/python/hail/context.pyc in init(self, sc, app_name, master, local, log, quiet, append, parquet_compression, min_block_size, branching_factor, tmp_dir)
69 ‘or stop Hail context to change configuration.’)
70
—> 71 SparkContext._ensure_initialized()
72
73 self._gateway = SparkContext._gateway

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway, conf)
281 with SparkContext._lock:
282 if not SparkContext._gateway:
–> 283 SparkContext._gateway = gateway or launch_gateway(conf)
284 SparkContext._jvm = SparkContext._gateway.jvm
285

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyspark/java_gateway.pyc in launch_gateway(conf)
75 def preexec_func():
76 signal.signal(signal.SIGINT, signal.SIG_IGN)
—> 77 proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
78 else:
79 # preexec_fn not supported on Windows

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
388 p2cread, p2cwrite,
389 c2pread, c2pwrite,
–> 390 errread, errwrite)
391 except Exception:
392 # Preserve original exception in case os.close raises.

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
1023 raise
1024 child_exception = pickle.loads(data)
-> 1025 raise child_exception
1026
1027

OSError: [Errno 2] No such file or directory

echo $SPARK_HOME
/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7

source /Users/AleRodriguez/hail/bin/setup_env
dirname: illegal option – b
usage: dirname path

echo $PYTHONPATH
/Users/python:/Users/AleRodriguez/spark-2.0.2-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip:/Users/python:/Users/AleRodriguez/spark-2.0.2-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip:/Users/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python/lib/py4j--src.zip:/Users/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python/lib/py4j--src.zip:/Users/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python/lib/py4j--src.zip:/Users/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python:/Users/AleRodriguez/spark-2.2.0-bin-hadoop2.7/python/lib/py4j--src.zip:

In case other users come here with similar bugs: do not use pip to install py4j and pyspark

@alerodriguez, I recommend executing pip uninstall py4j pyspark. I think leaving them installed will probably not cause a problem; however, having a simple environment makes both of our lives easier when debugging Spark or Hail.


Your SPARK_HOME variable must be set to the un-tarred Spark 2.0.2 directory. Recall from our previous conversation that you need to set it with this command:

export SPARK_HOME=/Users/AleRodriguez/spark-2.0.2-bin-hadoop2.7

Before you try running Hail again, make sure SPARK_HOME is set correctly by executing these commands:

ls $SPARK_HOME
cat $SPARK_HOME/RELEASE

The output should look similar to:

# ls $SPARK_HOME
LICENSE   R         RELEASE   conf      examples  licenses  sbin
NOTICE    README.md bin       data      jars      python    yarn
# cat $SPARK_HOME/RELEASE
Spark 2.0.2 built for Hadoop 2.7.3
Build flags: -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn -DzincPort=3036

In particular, if the second command’s output doesn’t start with “Spark 2.0.2” Hail will not work.

I was finally able to read and filter a vcf file!!
Thanks for your help!

1 Like