Fixing logreg, lmmreg error when using many sample covariates on Dataproc


#1

We were surprised to find that using more than 9 sample covariates in logistic or linear mixed regression on Google Dataproc would throw an error. We’ve engaged Google support on a superior fix, but in the meantime they’ve suggested and we’ve verified the work around of including the properties spark.driver.extraJavaOptions=-Xss4M and spark.executor.extraJavaOptions=-Xss4M in the cluster creation command to increase the Java stack size, e.g.:

--properties="spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M"

Alternatively these can be included in the Spark submit command. See this post for more information on using Hail on the Google cloud.

For those interested, the underlying issue relates to the linear solve routine in LAPACK called by Breeze natives, as mentioned in this StackOverflow post.


Fresh install fails eigSymDSuite and LinearMixedRegressionSuite tests
hail.java.FatalError: NoSuchMethodError: breeze.linalg.DenseVector$.canSetD()Lbreeze/generic/UFunc$InPlaceImpl2