While running a GWAS / linear regression I get: "Answer from Java side empty" and "undefined symbol: cblas_dgemv"

I’m getting this error when I try to run a simple GWAS

/apps/well/java/jdk1.8.0_latest/bin/java: symbol lookup error: /tmp/jniloader1259648657693918558netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/typecheck/check.py", line 614, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/methods/statgen.py", line 815, in logistic_regression_rows
    return result.persist()
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/typecheck/check.py", line 614, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/table.py", line 1870, in persist
    return Env.backend().persist_table(self, storage_level)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/backend/spark_backend.py", line 285, in persist_table
    return Table._from_java(self._jbackend.pyPersistTable(storage_level, self._to_java_table_ir(t._tir)))
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/hail/backend/py4j_backend.py", line 16, in deco
    return f(*args, **kwargs)
  File "/well/lindgren/UKBIOBANK/nbaya/conda/envs/hail/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
    format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o1.pyPersistTable

Here’s what I’m running:

import hail as hl
hl.init(log='hail.log')
mt = hl.balding_nichols_model(1,100,2)
mt = mt.annotate_cols(y = hl.rand_bool(0.5))
result_ht = hl.logistic_regression_rows(
	test='wald',
	y=mt.y, 
	x=mt.GT.n_alt_alleles(), 
	covariates=[1]
)

I’ve tried using an updated version of GCC and reinstalling Hail, but that doesn’t seem to change anything. I’m using the Oxford research cluster so I haven’t tried installing anything that isn’t available to load as a module.

Hey @nbaya !

I’m sorry you’re running into this issue. The key error is actually on the first line:

/apps/well/java/jdk1.8.0_latest/bin/java: symbol lookup error: /tmp/jniloader1259648657693918558netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv

Your system lacks a BLAS implementation with C-bindings (the “c” in “cblas”). Are you able to specifically load the OpenBLAS libraries? They should be compatible. The Intel MKL should also be compatible (and faster).

Hey @danking, it’s been a while :grin:

Our research computing team has been troubleshooting and it doesn’t seem like we can load the OpenBLAS library as a module and have Hail acknowledge it. Here’s the latest email from the computing manager:

When I looked at the file it was complaining about:

[crm194@rescomp1 pip]$ ldd /tmp/jniloader432969083728429575netlib-native_system-linux-x86_64.so
ldd: warning: you do not have execution permission for `/tmp/jniloader432969083728429575netlib-native_system-linux-x86_64.so’
linux-vdso.so.1 => (0x00007fff4e5bc000)
libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007f0d3b40f000)
libblas.so.3 => /lib64/libblas.so.3 (0x00007f0d3b1b6000)
liblapack.so.3 => /lib64/liblapack.so.3 (0x00007f0d3aa59000)
libc.so.6 => /lib64/libc.so.6 (0x00007f0d3a68b000)
libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f0d3a44f000)
libm.so.6 => /lib64/libm.so.6 (0x00007f0d3a14d000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0d39f37000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0d3bac1000)

you can see that it’s linking to the system BLAS and LAPACK libraries.

It’s those libraries that are lacking the symbols it is looking for, not the ones that we’re loading with the module, and which do have those symbols.

Somehow, Hail and/or the processes it is calling needs to be configured to respect the library paths of our modules, and not the system paths.

The brute force way of doing that would be to run it in a different environment, where we have more direct control over what is installed, such as a container or a VM.

Do you have any suggestions?

Oy vey, binary library resolution is hell. IMHO, if their module does not teach ldd to correctly resolve libblas.so.3, then the issue is with their module.

Nonetheless, you should be able to work around this by explicitly specifying LD_LIBRARY_PATH. You can set it with:

export LD_LIBRARY_PATH=/path/to/openblas/lib:$LD_LIBRARY_PATH

You’ll need to figure out where openblas is located. FWIW, on the Broad cluster, it is located at:

/broad/software/free/Linux/redhat_7_x86_64/pkgs/openblas_0.2.20/lib

You’ll probably need to both use OpenBLAS (or equivalent for your cluster) and test the LD_LIBRARY_PATH.

Hello

Here’s what the module provides:

/apps/eb/skylake/modules/all/OpenBLAS/0.3.1-GCC-7.3.0-2.30:

module-whatis    Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
module-whatis    Homepage: http://xianyi.github.com/OpenBLAS/
conflict         OpenBLAS
prepend-path     CPATH /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/include
prepend-path     LD_LIBRARY_PATH /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib
prepend-path     LIBRARY_PATH /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib
prepend-path     PATH /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/bin
prepend-path     PKG_CONFIG_PATH /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/pkgconfig
setenv           EBROOTOPENBLAS /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30
setenv           EBVERSIONOPENBLAS 0.3.1
setenv           EBDEVELOPENBLAS /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/easybuild/OpenBLAS-0.3.1-GCC-7.3.0-2.30-easybuild-devel

and here are the actual libraries in the module:

 ll /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib
total 39845
drwxr-xr-x 3 software software     4096 Jun 14  2019 cmake
lrwxrwxrwx 1 software software       30 Jun 14  2019 libopenblas.a -> libopenblas_skylakexp-r0.3.1.a
lrwxrwxrwx 1 software software       31 Jun 14  2019 libopenblas.so -> libopenblas_skylakexp-r0.3.1.so
lrwxrwxrwx 1 software software       31 Jun 14  2019 libopenblas.so.0 -> libopenblas_skylakexp-r0.3.1.so
-rw-r--r-- 1 software software 27203344 Jun 14  2019 libopenblas_skylakexp-r0.3.1.a
-rwxr-xr-x 1 software software 13567208 Jun 14  2019 libopenblas_skylakexp-r0.3.1.so
drwxr-xr-x 2 software software     4096 Jun 14  2019 pkgconfig

Here’s an example of another module that resolves our OpenBLAS module successfully:

ldd /apps/eb/skylake/software/R/3.6.0-foss-2018b/lib64/R/bin/exec/R | grep blas
        libopenblas.so.0 => /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so.0 (0x00007f0aeb9fc000)

Best Wishes,
Adam

Ahah, thanks for the detailed information, Adam! The issue seems to be that netlib-java expects libopenblas to be at libblas.

@nbaya, quick fix would be to set

export LD_PRELOAD=/apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so

A more durable fix would be to create some symlinks for libblas.so. You can do this yourself, @nbaya, by adding them to LD_LIBRARY_PATH:

mkdir -p ~/lib
ln -s /apps/eb/skylake/software/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblas.so \
      ~/lib/libblas.so
export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH

The quick fix works, thanks @danking !

The durable fix doesn’t seem to work. Do I need to reinstall Hail once I’ve loaded the appropriate modules and appended to LD_LIBRARY_PATH?

Ah, my bad @nbaya , I made the same mistake in my durable fix that we originally encountered. The target of the symlink needs to be ~/lib/libblas.so, not ~/lib/libopenblas.so. I’ve fixed my post above.