Fixing logreg, lmmreg error when using many sample covariates on Dataproc

jbloom · March 10, 2017, 2:16am

We were surprised to find that using more than 9 sample covariates in logistic or linear mixed regression on Google Dataproc would throw an error. We’ve engaged Google support on a superior fix, but in the meantime they’ve suggested and we’ve verified the work around of including the properties spark.driver.extraJavaOptions=-Xss4M and spark.executor.extraJavaOptions=-Xss4M in the cluster creation command to increase the Java stack size, e.g.:

--properties="spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M"

Alternatively these can be included in the Spark submit command. See this post for more information on using Hail on the Google cloud.

For those interested, the underlying issue relates to the linear solve routine in LAPACK called by Breeze natives, as mentioned in this StackOverflow post.

shuang · July 31, 2020, 6:27am

Hi, Thanks for setting guide.

I am still use hail0.1 version -0d9d9fa
I use this setting succeed in handling 1T dataset. (import vcf to vds)

Now I need to handle 3T dataset, I am wondering do I need to increase this recommend -Xss4M ?

johnc1231 · August 5, 2020, 1:55pm

We do not support Hail 0.1 anymore. You’ll have to try the current way and increase it if you run into problems. We strongly recommend updating to Hail 0.2, which will have better performance.

Topic		Replies	Views
Java Heap Space out of memory Hail Query & hailctl	5	3641	August 10, 2020
Heap out of memory Hail Query & hailctl	14	1807	July 21, 2020
Memory issue - java.lang.OutOfMemoryError: Java heap space when running linreg3() Help [0.1]	6	4078	February 12, 2018
JAVA version problem Hail Query & hailctl	4	57	November 15, 2024
"Hail off-heap memory exceeded maximum threshold" error on large analysis job Hail Query & hailctl	1	305	April 18, 2023

Fixing logreg, lmmreg error when using many sample covariates on Dataproc

Related topics