I submitted a hail batch job and I got an out of memory error. I believe I am running on my team’s google cloud account (nnf-karczewski), so not sure how it would run out of memory.
[Stage 2:======> (4895 + 4) / 42287]
Traceback (most recent call last):
File "<string>", line 32, in <module>
File "/Users/tk508/HAIL/Genetics-Gym/my_gg/jobs_submit/fix_esm1b.py", line 34, in update_rs_vsm
File "<decorator-gen-1216>", line 2, in checkpoint
File "/usr/local/lib/python3.9/dist-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.9/dist-packages/hail/table.py", line 1963, in checkpoint
self.write(output=output, overwrite=overwrite, stage_locally=stage_locally, _codec_spec=_codec_spec)
File "<decorator-gen-1218>", line 2, in write
File "/usr/local/lib/python3.9/dist-packages/hail/typecheck/check.py", line 585, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.9/dist-packages/hail/table.py", line 2005, in write
Env.backend().execute(
File "/usr/local/lib/python3.9/dist-packages/hail/backend/spark_backend.py", line 217, in execute
raise err
File "/usr/local/lib/python3.9/dist-packages/hail/backend/spark_backend.py", line 209, in execute
return super().execute(ir, timed)
File "/usr/local/lib/python3.9/dist-packages/hail/backend/backend.py", line 181, in execute
raise e.maybe_user_error(ir) from None
File "/usr/local/lib/python3.9/dist-packages/hail/backend/backend.py", line 179, in execute
result, timings = self._rpc(ActionTag.EXECUTE, payload)
File "/usr/local/lib/python3.9/dist-packages/hail/backend/py4j_backend.py", line 221, in _rpc
raise fatal_error_from_java_error_triplet(
hail.utils.java.FatalError: FileNotFoundException: /tmp/blockmgr-2a3e6288-bd4c-48c6-a382-b83e408658b8/0b/temp_shuffle_d103eba0-3869-43b1-b03f-e94042b7aed6 (No space left on device)