Out of memory error for the de_novo results

Hi,

I run hail de_novo() in our local server, which has 4 CPUs with 14 cores for each, 768G memory. My data is the WGS data for 103 trios. The variants after filter is ~14 millions. When I run de_novo for them, after the command of de_novo(), when I run de_novo_results.count(), it shows java.lang.OutOfMemoryError, even the .show(5) will show java.lang.OutOfMemoryError. I tried to run them for only one chr, it still shows java.lang.OutOfMemoryError.

de_novo_results = hl.de_novo(mt, pedigree, pop_frequency_prior=priors[mt.row_key].AF)
de_novo_results.show(5)
de_novo_results.count()

But I have checked the top, the memory didn’t use a lot, there are still many memory free, as shown below:

KiB Mem : 79250451+total, 38696064+free, 12923628 used, 39262022+buff/cache

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
194642 suhuach+ 20 0 37.4g 1.8g 31412 S 5841 0.2 2726:58 java

Could anyone know how I can fix this error? I have been stuck here for two days. Thanks!

Hello,

Before running hail, try exporting this -

export PYSPARK_SUBMIT_ARGS='--driver-memory 250G --executor-memory 250G pyspark-shell'

edit memory according to your system, but do leave some RAM for your system.

It works. Thanks!