How to control RAM usage

mpinese · November 17, 2016, 11:24pm

Loving Hail so far, thanks!

Is there a way to control RAM usage? I’ve been testing Hail on a 56-core 512 GB RAM machine, and have noticed that Hail consistently uses only 95-105 GB of RAM, regardless of task or data. Is there a way to instruct Hail to use more RAM, and would there be any benefit to this?

jbloom · November 17, 2016, 11:52pm

You have nearly 10GB per core, far more than is typical. For example, standard Google Dataproc cores have 3.75GB, and even the high-memory ones only have 6.5GB. We’ve consciously written Hail to operate within these constraints, so the short answer is, no, you can’t take advantage of more RAM, and the RAM usage you’re seeing is about what we’d expect from 56 cores.

The longer answer is that there are a few situations where one implementation strategy is faster but more memory intensive than another. For example, when computing a kinship matrix X * X^T from a matrix X of genotypes distributed by variant, one could use Spark’s RowMatrix.computeGrammian or convert to BlockMatrix and use BlockMatrix multiplication. The former is faster but requires every core to store two copies of an n x n matrix of doubles (one copy accumulates into the other), where n is the number of samples; so thats about 2 * 8 * n^2 bytes per core, which grows quickly with n. For computing kinship in linear mixed models (currently a pull request), I’ve chosen a cut-off of n = 3000 for switching from the former to the latter method, but I also include “advanced” options to force a particular implementation, and on your machine you may find that its possible and worth using computeGrammian at larger sample sizes.

Topic		Replies	Views
Running hail locally - number of cores Hail Query & hailctl	3	828	March 28, 2023
What makes hail go fast locally? Hail Query & hailctl	1	412	November 2, 2020
Hardware requirements Hail Query & hailctl	5	426	October 25, 2020
Questions about optimizing Hail and Spark configs and estimating resources and runtimes Hail Query & hailctl	3	1182	December 1, 2022
Java Heap Space out of memory Hail Query & hailctl	5	3670	August 10, 2020

How to control RAM usage

Related topics