How do I increase the memory or RAM available to the JVM when I start Hail through Python?

I’m using Hail via IPython on my laptop, like this:

import hail as hl

ibd = hl.read_matrix_table("/path/to/my.mt").identity_by_descent()

When I try to run this, I get a big error message that starts with this:

FatalError: OutOfMemoryError: Java heap space

or

OutOfMemoryError: GC overhead limit exceed

or

RemoteDisconnected('Remote end closed connection without response')

How do I increase the memory available to the Java process?

You can set the memory using an environment variable:

PYSPARK_SUBMIT_ARGS="--driver-memory 8g --executor-memory 8g pyspark-shell" ipython

This will start an ipython notebook with 8 GB of memory. If you want ipython to always start with 8 GB of memory, you can add this to your .bashrc (or the equivalent file for your shell):

export PYSPARK_SUBMIT_ARGS="--driver-memory 8g --executor-memory 8g pyspark-shell"
1 Like

I had the same problem, and was able to solve it with the code below.

import hail as hl
hl.init(spark_conf={'spark.driver.memory': '100g'})
1 Like