Our users run Hail using our HPC MPI queues.
We have several MPI queues. One queue with 64 cores per node (each node has 512GB of RAM). We also have an MPI queue with 28 cores per node (each node has 256 GB of memory). Sometimes, when our users submit their jobs, they get “OutOfMemoryError: Java heap space” errors. I am trying to come up with some guidance on how to set the following spark parameters to avoid this memory problem:
Unfortunately there’s no one-size-fits-all solution when tweaking these parameters; it often needs to be on a case-by-case basis. We’ve configured a list of defaults for google dataproc which you could use as a general pointer (link below) by comparing the specifications of your HPC nodes with google’s instances.
Generally speaking, if your users are doing lots of joins or explodes, more worker memory is good. If their pipelines are very long or there are lots of partitions or columns in their data, more executor memory might be beneficial.
After doing a little digging, min_block_size is an alias for the spark configuration parameter spark.hadoop.mapreduce.input.fileinputformat.split.minsize which affects the minimum block size hdfs uses for reads. By default, hail uses 1MB blocks so unless you’ve configured this to something very large or users are processing thousands of joins in the same pipeline, I wouldn’t have thought it would affect memory consumption too much.