Hi,
I am having problems importing the hail package into python.
Weirdly, I can run the jupyter hail tutorials no problem.
However, creating a new jupyter notebook in that same tutorials folder and trying to import hail there produces the error
ImportError: No module named hail
Running interactive python from elsewhere and attempting to import hail produces the same No module named hail error.
I have set SPARK_HOME and PATH variables as in the ‘Getting started’ instructions, and they now contain the locations of the relevant packages.
I generally use anaconda environments for running python, is there a way to easily make hail available within an anaconda environment that I have created?
Since we haven’t registered Hail on PyPI, I think it’s hard to make a conda yaml file to set up everything automatically.
The problem is probably that the Hail python zip isn’t on your path – are you using the jhail script in the distribution? That sets it up. Otherwise, you’ll need to add the Hail python library to the PYTHONPATH environment variable.
Let me know if this is confusing (I’ve slightly confused myself).
Yes, I was using the jhail command, I had forgotten that.
What exactly should I add to my PYTHONPATH variable? This variable is indeed empty right now, as it’s discouraged if you are using anaconda. However I am not afraid to change it.
Hmm, interesting – I should really look into what Anaconda prefers, it’s a great Python installation and we should totally play nicely with it. I use Anaconda and PYTHONPATH on my local machine and everything seems to work (I have Hail and py4j in my PYTHONPATH and they persist between virtual envs).
I really like Anaconda. They say it’s ok to mix it with having a filled PYTHONPATH variable, but just sometimes causes problems depending on what’s being done.
So I have HAIL_HOME and SPARK_HOME variables set already, according to the getting started instructions.
Are you saying those might need to be altered, or do you think I can copy what you have except for the ‘python/lib/py4j-0.10.3-src.zip’ part? (Or maybe even that will be the same for me, I just have to look and see what seems to correspond on my system).
I should have the ‘correct’ spark; I hadn’t used spark before so I just downloaded the one that should ‘match’ with hail.
I copied what you had exactly, and I got a new error:
Traceback (most recent call last):
File “”, line 1, in
File “/Applications/hail/python/hail/init.py”, line 1, in
import hail.expr
File “/Applications/hail/python/hail/expr.py”, line 2, in
from hail.java import scala_object, Env, jset
File “/Applications/hail/python/hail/java.py”, line 7, in
from decorator import decorator
ImportError: No module named decorator
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel).
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in init
File “/Applications/hail/python/hail/typecheck/check.py”, line 245, in _typecheck
return f(*args, **kwargs)
File “/Applications/hail/python/hail/context.py”, line 88, in init
parquet_compression, min_block_size, branching_factor, tmp_dir)
TypeError: ‘JavaPackage’ object is not callable
It just comes down to those pesky environment variables. Perhaps you could include something about these in a troubleshooting section or something? Also a note that people should not assume that because the tutorials run, everything is set?
I’m looking forward to messing with some vds’s now. Thanks again.
Just to add I think Decorator does come with Anaconda (I had an old version), but even so I needed to add it to the Anaconda environment I was working in for it to be available.
The files in the scripts folder are not meant to be used with the source (from git / GitHub) distribution of hail. Those are files that are used by the packaging mechanism to produce the pre-packaged Hail distributions.
If you downloaded the pre-built Hail distribution from the Getting Started page and you encountered these problems, then we should understand why and fix the pre-built distribution.
I strongly recommend using the pre-built distribution mentioned on the getting started page.