Timeout while writing to CSV

I’m trying to write a csv:

chip_vars_mt.cols().export('cols.csv', delimiter=',')

But I keep getting the following error after a while:

2020-03-30 17:57:17 Hail: WARN: cols(): Resulting column table is sorted by 'col_key'.
    To preserve matrix table column order, first unkey columns with 'key_cols_by()'
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/home/nicholas/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

Can you share the log file? The true error should be in there.

here you go:
hail-20200330-1917-0.2.34-914bd8a10ca2.log (2.4 MB)

I don’t see anything wrong immediately. The process takes about 35 min before it craps out.

I tried increasing the timeout to 10000s but that didnt seem to help

The log appears to just stop about one minute after starting. @tpoterba, maybe you have a better sense?

I suspect he’ll want to see the full pipeline to understand what’s happening.

This could be an out-of-memory error. How are you running Hail? Are you running on a cluster or in local mode (laptop, server, etc)?

Running in local mode. I’m using the following config:

[('spark.jars',
  'file:///home/nicholas/miniconda3/envs/hail/lib/python3.7/site-packages/hail/hail-all-spark.jar'),
 ('spark.hadoop.io.compression.codecs',
  'org.apache.hadoop.io.compress.DefaultCodec,is.hail.io.compress.BGzipCodec,is.hail.io.compress.BGzipCodecTbi,org.apache.hadoop.io.compress.GzipCodec'),
 ('spark.ui.showConsoleProgress', 'false'),
 ('spark.executor.id', 'driver'),
 ('spark.logConf', 'true'),
 ('spark.kryo.registrator', 'is.hail.kryo.HailKryoRegistrator'),
 ('spark.driver.host', 'sci-pvm-nicholas.calicolabs.local'),
 ('spark.hadoop.mapreduce.input.fileinputformat.split.minsize', '134217728'),
 ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),
 ('spark.driver.extraClassPath',
  '/home/nicholas/miniconda3/envs/hail/lib/python3.7/site-packages/hail/hail-all-spark.jar'),
 ('spark.kryoserializer.buffer.max', '1g'),
 ('spark.driver.port', '35007'),
 ('spark.driver.maxResultSize', '0'),
 ('spark.executor.extraClassPath', './hail-all-spark.jar'),
 ('spark.master', 'local[*]'),
 ('spark.repl.local.jars',
  'file:///home/nicholas/miniconda3/envs/hail/lib/python3.7/site-packages/hail/hail-all-spark.jar'),
 ('spark.submit.deployMode', 'client'),
 ('spark.app.name', 'Hail'),
 ('spark.driver.memory', '32G'),
 ('spark.app.id', 'local-1585615971418'),
 ('spark.executor.heartbeatInterval', '10s'),
  ('spark.network.timeout', '10000s')]

I increased network timeout and driver memory. And I initialize:
hl.init(min_block_size=128) like so. I was running out of memory previously, so I made these updates per another thread on the forum. I suppose that could be happening again.

In the just-released 0.2.35 we fixed a memory leak present in 0.2.34. Could you try again on this version?

@tpoterba I fixed this yesterday by bumping my memory. I’ll see if things will run with lower memory and the new version.

We found a memory leak in 0.2.35 as well as 0.2.34… This should really be fixed in 0.2.36, which we’ll try to release today.