Hail fails after installing it on a single computer

Hi Hail team,

I’ve used Hail before on a Spark cluster without issues, but now we would like to install it for some tasks on small datasets on a single computer/node (RHEL 9).

I followed the instructions here: Hail | Install Hail on GNU/Linux

But I’m getting the error detailed below. Any help/ideas?

Thanks

Enrique

>>> import hail as hl
>>> hl.init()
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/opt/anaconda3/envs/hail/lib/python3.9/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.3
SparkUI available at http://xxxx:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.126-ee77707f4fab
LOGGING: writing to /home/eam/hail-20231120-1825-0.2.126-ee77707f4fab.log
>>> hl.utils.get_movie_lens('data/')
2023-11-20 18:27:48.120 Hail: INFO: downloading MovieLens-100k data ...
  Source: https://files.grouplens.org/datasets/movielens/ml-100k.zip
2023-11-20 18:27:50.320 Hail: INFO: importing users table and writing to data/users.ht ...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/utils/tutorial.py", line 236, in get_movie_lens
    hl.import_table(user_cluster_readable, key=['f0'], no_header=True, impute=True, delimiter='|'),
  File "<decorator-gen-1458>", line 2, in import_table
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/typecheck/check.py", line 587, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/methods/impex.py", line 1718, in import_table
    first_rows = first_row_ht.annotate(
  File "<decorator-gen-1230>", line 2, in collect
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/typecheck/check.py", line 587, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/table.py", line 2168, in collect
    return Env.backend().execute(e._ir, timed=_timed)
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/backend/backend.py", line 178, in execute
    result, timings = self._rpc(ActionTag.EXECUTE, payload)
  File "/opt/anaconda3/envs/hail/lib/python3.9/site-packages/hail/backend/py4j_backend.py", line 212, in _rpc
    error_json = orjson.loads(resp.content)
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)

Hi @enriquea, sorry you hit this error! Could you send the hail log file (path shown right underneath the ascii hail logo) so we can diagnose this issue?

Thanks,
Daniel

Hi @danielgoldstein,

Sure! Attached the log file.
I was expecting it to be more verbose, though.

Thanks

Enrique
hail-20231120-1825-0.2.126-ee77707f4fab.log (542 Bytes)

Hi @danielgoldstein,

Any idea about how to solve this issue?

Thanks in advance!

Enrique

Hi there,

I just wanted to report that the issue seems to be related to the specific version 0.2.126 (latest).

Switching and trying out a minor version (0.2.120) works for me!

In [2]: hl.init()
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/home/eam/.local/lib/python3.9/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.3
SparkUI available at http://xxxx:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.120-f00f916faf78
LOGGING: writing to /home/eam/hail-20231128-1646-0.2.120-f00f916faf78.log

In [3]: hl.utils.get_1kg('data/')
2023-11-28 16:47:45.108 Hail: INFO: downloading 1KG VCF ...
  Source: https://storage.googleapis.com/hail-tutorial/1kg.vcf.bgz
2023-11-28 16:47:46.564 Hail: INFO: importing VCF and writing to matrix table...
2023-11-28 16:47:47.678 Hail: INFO: scanning VCF for sortedness...
2023-11-28 16:47:51.595 Hail: INFO: Coerced sorted VCF - no additional import work to do
2023-11-28 16:47:54.224 Hail: INFO: wrote matrix table with 10879 rows and 284 columns in 16 partitions to data/1kg.mt
2023-11-28 16:47:54.381 Hail: INFO: downloading 1KG annotations ...
  Source: https://storage.googleapis.com/hail-tutorial/1kg_annotations.txt
2023-11-28 16:47:54.836 Hail: INFO: downloading Ensembl gene annotations ...
  Source: https://storage.googleapis.com/hail-tutorial/ensembl_gene_annotations.txt

2023-11-28 16:47:55.770 Hail: INFO: Done!

Hi @enriquea, my apologies for the delay. I’m glad you’re able to get some version of hail working. I wasn’t able to reproduce your error on my machine, so if you could lend some time to help us diagnose the issue so you can use the latest version that would be greatly appreciated! I’ve created an issue to track this bug. A couple questions:

  1. Is 0.2.120 the latest working version on your machine? Are you able to successfully run with 0.2.124?
  2. Your error broke after running the movie lens import. Does it break on 0.2.126 for other kinds of pipelines? Could you try instead of the movie lens import running hl.utils.range_table(10).collect()? That’s one of the simplest programs you can you run in hail and would help us sort out if your installation is fundamentally broken or if there’s something subtler going on.

Thanks,
Daniel

We had a similar issue on our cluster for hail/0.2.126 too, not sure if it is a bug but the error is caused by this line when hail trying to talk to spark server on localhost:

If your computer is using squid proxy as we are, then this workaround should work to avoid using proxy for localhost:
export NO_PROXY=localhost