Hail hl.linear_mixed_model out of memory

Hi Hail team

I am running a global linear mixed model with 480k individuals and 42k variants (marker list). The matrix table is partitioned to 100 blocks.

The code I am using is:
filtered_mt=filtered_mt.repartition(100)
model, _ = hl.linear_mixed_model(
y=filtered_mt.pheno.BMI,
x=[1],
z_t=filtered_mt.GT.n_alt_alleles(),
p_path=‘output/p.bm’)

I have also tried different partitions, e.g. 10 and 500 blocks. Also I tried various settings for SPARK, but alway return the out of memory error.
e.g.
setting 1:
–conf spark.executor.cores=5
–conf spark.executor.memoryOverhead=6g
–conf spark.executor.memory=19g
–conf spark.executor.instances=17

setting 2:
–conf spark.executor.cores=4
–conf spark.executor.memoryOverhead=20g
–conf spark.executor.memory=40g
–conf spark.executor.instances=6

I have pasted the error message blow.

Many thanks.

Best wishes,
Qin

Traceback (most recent call last):
File “”, line 5, in
File “”, line 2, in linear_mixed_model
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/methods/statgen.py”, line 1070, in linear_mixed_model
model, p = LinearMixedModel.from_random_effects(y_nd, x_nd, z_bm, p_path, overwrite)
File “”, line 2, in from_random_effects
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/stats/linear_mixed_model.py”, line 1081, in from_random_effects
u, s0, _ = z.svd(complexity_bound=complexity_bound)
File “”, line 2, in svd
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/linalg/blockmatrix.py”, line 2403, in svd
return self._svd_gramian(compute_uv)
File “”, line 2, in _svd_gramian
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/linalg/blockmatrix.py”, line 2415, in _svd_gramian
.sparsify_triangle(lower=True, blocks_only=True)
File “”, line 2, in to_numpy
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/linalg/blockmatrix.py”, line 1205, in to_numpy
self.tofile(uri)
File “”, line 2, in tofile
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/linalg/blockmatrix.py”, line 1177, in tofile
Env.backend().execute(BlockMatrixWrite(self._bmir, writer))
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/backend/py4j_backend.py”, line 98, in execute
raise e
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/backend/py4j_backend.py”, line 74, in execute
result = json.loads(self._jhc.backend().executeJSON(jir))
File “/opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4757.13086441/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/local/apps/anaconda-python36/lib/python3.6/site-packages/hail/backend/py4j_backend.py”, line 32, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest), error_id) from None
hail.utils.java.FatalError: OutOfMemoryError: Java heap space

Java stack trace:
java.lang.OutOfMemoryError: Java heap space
at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:141)
at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:139)
at breeze.linalg.DenseMatrix$mcD$sp.toArray$mcD$sp(DenseMatrix.scala:129)
at is.hail.utils.richUtils.RichDenseMatrixDouble$.toCompactData$extension(RichDenseMatrixDouble.scala:90)
at is.hail.utils.richUtils.RichDenseMatrixDouble$.exportToDoubles(RichDenseMatrixDouble.scala:53)
1(InterpretNonCompilable.scala:16)
at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)
at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)
at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:14)
at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:12)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:360)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:344)
at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:341)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:25)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:23)
at is.hail.utils.package$.using(package.scala:618)

Hail version: 0.2.62-84fa81b9ea3d
Error summary: OutOfMemoryError: Java heap space

Hi! I’m sorry you’re running into these issues. Unfortunately LMM functionality in Hail is very poorly maintained, and you’ll almost certainly have a better experience using Hail for QC then exporting to a BGEN/VCF to use tools like BOLT-LMM or SAIGE to run the statistical analysis.