Densify running out of memory

We scraped everything using gcloud dataproc clusters diagnose and it seemed that results were being put in HDFS.

oh, totally, it’s the aggs per partition: https://github.com/hail-is/hail/blob/10e525a56787dd00b6aaed62d856c1eabd3c0f3a/hail/src/main/scala/is/hail/expr/ir/TableIR.scala#L1666-L1674

What was the cluster config? non_preempts, preempts?

It’s 300 workers all with lots of disk, we’re not running out. I suggested lower workers with more cores and that failed too.

Update here to not forget. The resource we are running out of in the latest bug is definitely memory :grimacing:. I’ve tracked the change that introduced the regression to https://github.com/hail-is/hail/pull/8794 and we are continuing to investigate.

1 Like

This PR fixes the problem:

1 Like