Import plink - Error summary: OutOfMemoryError: Java heap space

Hi,
I am trying to import binary plink files and keep getting the error message below.

Hail version: 0.2.34-914bd8a10ca2
Error summary: OutOfMemoryError: Java heap space

I changed the memory setting using the command below
PYSPARK_SUBMIT_ARGS="–driver-memory 60g --executor-memory 60g pyspark-shell"

But that did not make any difference.

Any advice how i can sort this out please?

How big are the PLINK files, and what’s the full pipeline? in particular, how many variants?

Hi Tim,
Thanks for your quick reply
I filtered my file and at the moment there are 3.7M variants with 5200 samples.
File (.bed) size is is 4.6GB

2020-03-16 11:23:25 root: INFO: Timer: Time taken for InterpretNonCompilable – Verify : 0.015ms, total 104.492ms

2020-03-16 11:23:25 root: INFO: interpreting non compilable node: TableWrite

2020-03-16 11:23:25 UnifiedMemoryManager: INFO: Will not store broadcast_1 as the required space (1401711920 bytes) exceeds our memory limit (384093388 bytes)

2020-03-16 11:23:25 MemoryStore: WARN: Not enough space to cache broadcast_1 in memory! (computed 891.9 MB so far)

2020-03-16 11:23:25 MemoryStore: INFO: Memory use = 1752.1 KB (blocks) + 1024.0 KB (scratch space shared across 1 tasks(s)) = 2.7 MB. Storage limit = 366.3 MB.

2020-03-16 11:23:25 BlockManager: WARN: Persisting block broadcast_1 to disk instead.

2020-03-16 11:23:42 BlockManager: WARN: Block broadcast_1 could not be removed as it was not found on disk or in memory

2020-03-16 11:23:46 root: ERROR: OutOfMemoryError: Java heap space

From java.lang.RuntimeException: error while applying lowering ‘InterpretNonCompilable’

OK, we really need to fix import_plink. Here’s a tracking issue:

For now, you’ll have a much better experience using PLINK to export a VCF and importing that.