BLAS/LAPACK Linking Issues

Hi!

I have been running into issues with BLAS/LAPACK linking when runing Hail functions that use linear algebra operations like linear regression or PCA. The error message looks like this:

symbol lookup error: /tmp/jniloader421303948191888475netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv

I have tried the fixes proposed in this forum:

and this one:

But neither have worked to resolve the issue. In particular, if I try the “quick fix” to set LD_PRELOAD to the OpenBLAS path, I get an error from NumPy when importing Hail before initialization:

RuntimeError: The current Numpy installation ('/nfs/sw/hail/hail-0.2/python/lib/python3.9/site-packages/numpy/__init__.py') fails to pass simple sanity checks. This can be caused for example by incorrect BLAS library being linked in, or by mixing package managers (pip, conda, apt, ...). Search closed numpy issues for similar problems.

Is there another way to resolve this issue?

@sk4 ,

What environment are you in? Do you have root privileges on these machines?

Are you certain you have OpenBLAS installed? You need a copy of open blas with the name libblas.so and you need to convince netlib to load that not something else. Did you already try both of these approaches?

mkdir -p ~/lib
ln -s /path/to/your/libopenblas.so ~/lib/libblas.so
export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH

and starting Spark with this?

--conf spark.executor.extraClassPath="/path/to/libblas.so:/path/to/liblapack.so"

The hl.init approach only works if the Spark cluster isn’t already running.

Hi @danking,

Well, it looks like the Spark --conf flag approach works now when using spark-submit. I tried it earlier along with the symlink approach, but perhaps I didn’t specify the OpenBLAS and LAPACK paths correctly. Thanks for your quick response!

1 Like

Hi @danking,

It appears that the BLAS linking issue isn’t fully solved for me. While setting the --conf flag to point to the OpenBLAS/LAPACK paths when running spark-submit works for hl.hwe_normalized_pca, the same setting doesn’t work for hl.logistic_regression_rows. I’ve tried different settings based on the fixes you suggested, but none of them have worked so far.

Option 1: set the --conf path to directly point to OpenBLAS/LAPACK, i.e.

--conf spark.executor.extraClassPath=/usr/lib64/libopenblas.so:/usr/lib64/liblapack.so

Option 2: create symlinks to OpenBLAS/LAPACK and add to LD_LIBRARY_PATH, i.e.

ln -s /usr/lib64/libopenblas.so ~/lib/libblas.so`
ln -s /usr/lib64/liblapack.so ~/lib/liblapack.so`
export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH

Option 3: set the --conf path to point to the symlink for OpenBLAS/LAPACK, i.e.

--conf spark.executor.extraClassPath=~/lib/libblas.so:~/lib/liblapack.so

Option 4: enable both Option 2 + 3

This is the example I’m trying to run, initially posted in this forum:

import hail as hl
hl.init(log='hail.log')
mt = hl.balding_nichols_model(1,100,2)
mt = mt.annotate_cols(y = hl.rand_bool(0.5))
result_ht = hl.logistic_regression_rows(
	test='wald',
	y=mt.y, 
	x=mt.GT.n_alt_alleles(), 
	covariates=[1]
)

I don’t have root privileges on the server I’m using, and Hail is loaded through a modulefile (though I load Spark/Java dependencies separately). In general, I start the Spark server first before initializing Hail with spark-submit, so hl.init(spark_conf=...) wouldn’t work like you mentioned. Is there any other fix that could work for this issue?

Hey @sk4 !

Can you try not using ~ in the --conf? It’s possible that the tilde isn’t expanded. Likewise for the LD_LIBRARY_PATH.


What is the error you’re getting now? I need the hail log file and the Python stack trace to diagnose.

Just to be clear: you’re using an on-premises spark cluster with multiple computers, not just a single computer, right?

Option 1 won’t work: the linear algebra libraries on which Hail depends expect the object files to be named libblas.so

Hi @danking,

I tried removing the ~ and specifying the absolute path to the symlink for Options 2 - 4, but it didn’t fix the issue. It’s the same symbol lookup error I initially posted:

symbol lookup error: /tmp/jniloader11471909008711532803netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv

I’m using a multi-node Slurm cluster and running Spark on top of it. From what I see in the logs, Python errors out without bringing up a full stack trace. Is there an email ID I can send the log files to?