I have been running into issues with BLAS/LAPACK linking when runing Hail functions that use linear algebra operations like linear regression or PCA. The error message looks like this:
symbol lookup error: /tmp/jniloader421303948191888475netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv
I have tried the fixes proposed in this forum:
and this one:
But neither have worked to resolve the issue. In particular, if I try the “quick fix” to set LD_PRELOAD to the OpenBLAS path, I get an error from NumPy when importing Hail before initialization:
RuntimeError: The current Numpy installation ('/nfs/sw/hail/hail-0.2/python/lib/python3.9/site-packages/numpy/__init__.py') fails to pass simple sanity checks. This can be caused for example by incorrect BLAS library being linked in, or by mixing package managers (pip, conda, apt, ...). Search closed numpy issues for similar problems.
What environment are you in? Do you have root privileges on these machines?
Are you certain you have OpenBLAS installed? You need a copy of open blas with the name libblas.so and you need to convince netlib to load that not something else. Did you already try both of these approaches?
Well, it looks like the Spark --conf flag approach works now when using spark-submit. I tried it earlier along with the symlink approach, but perhaps I didn’t specify the OpenBLAS and LAPACK paths correctly. Thanks for your quick response!
It appears that the BLAS linking issue isn’t fully solved for me. While setting the --conf flag to point to the OpenBLAS/LAPACK paths when running spark-submit works for hl.hwe_normalized_pca, the same setting doesn’t work for hl.logistic_regression_rows. I’ve tried different settings based on the fixes you suggested, but none of them have worked so far.
Option 1: set the --conf path to directly point to OpenBLAS/LAPACK, i.e.
I don’t have root privileges on the server I’m using, and Hail is loaded through a modulefile (though I load Spark/Java dependencies separately). In general, I start the Spark server first before initializing Hail with spark-submit, so hl.init(spark_conf=...) wouldn’t work like you mentioned. Is there any other fix that could work for this issue?
I tried removing the ~ and specifying the absolute path to the symlink for Options 2 - 4, but it didn’t fix the issue. It’s the same symbol lookup error I initially posted:
symbol lookup error: /tmp/jniloader11471909008711532803netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemv
I’m using a multi-node Slurm cluster and running Spark on top of it. From what I see in the logs, Python errors out without bringing up a full stack trace. Is there an email ID I can send the log files to?