I am trying to import binary plink files and keep getting the error message below.
Hail version: 0.2.34-914bd8a10ca2
Error summary: OutOfMemoryError: Java heap space
I changed the memory setting using the command below
PYSPARK_SUBMIT_ARGS="–driver-memory 60g --executor-memory 60g pyspark-shell"
But that did not make any difference.
Any advice how i can sort this out please?
How big are the PLINK files, and what’s the full pipeline? in particular, how many variants?
Thanks for your quick reply
I filtered my file and at the moment there are 3.7M variants with 5200 samples.
File (.bed) size is is 4.6GB
2020-03-16 11:23:25 root: INFO: Timer: Time taken for InterpretNonCompilable – Verify : 0.015ms, total 104.492ms
2020-03-16 11:23:25 root: INFO: interpreting non compilable node: TableWrite
2020-03-16 11:23:25 UnifiedMemoryManager: INFO: Will not store broadcast_1 as the required space (1401711920 bytes) exceeds our memory limit (384093388 bytes)
2020-03-16 11:23:25 MemoryStore: WARN: Not enough space to cache broadcast_1 in memory! (computed 891.9 MB so far)
2020-03-16 11:23:25 MemoryStore: INFO: Memory use = 1752.1 KB (blocks) + 1024.0 KB (scratch space shared across 1 tasks(s)) = 2.7 MB. Storage limit = 366.3 MB.
2020-03-16 11:23:25 BlockManager: WARN: Persisting block broadcast_1 to disk instead.
2020-03-16 11:23:42 BlockManager: WARN: Block broadcast_1 could not be removed as it was not found on disk or in memory
2020-03-16 11:23:46 root: ERROR: OutOfMemoryError: Java heap space
From java.lang.RuntimeException: error while applying lowering ‘InterpretNonCompilable’
OK, we really need to fix import_plink. Here’s a tracking issue:
For now, you’ll have a much better experience using PLINK to export a VCF and importing that.