Export_vcf OutOfMemoryError: Java heap space despite --driver-memory 8g


#21

Table.select (And MT.select_rows / select_cols) keep key fields implicitly - so rows = ds.rows().select('rsid') preserves the locus/alleles key.

What I want to do here is do the expensive and temperamental shuffle-join once with a small dataset, save it, and use that with the bgen.


#22

OK, I see, but how do I then use it with the bgen?


#23

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

#24

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

#25

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

#26

OK… I wasn’t sure if using filter_rows was still the way to go. Thank you, I’ll try that and let you know how it goes.


#27

Unfortunately, it seems the shuffling issue persists even on the smaller matrix, it crashes when it’s executed (i.e. when saving the matrix):

variant_exclusion_table = hl.import_table("/mnt/output/HRC.vcfs/HRC_variants.tsv.bgz", no_header=True, key=‘f0’)
variant_exclusion_table = variant_exclusion_table.key_by(‘f0’)
rows = ds.rows().select(‘rsid’)
rows = rows.annotate(included = ~hl.is_defined(variant_exclusion_table[rows.rsid]))
rows.write(“mnt/output/HRC.vcfs/UK10K_inclusion_status.ht”)

[Stage 6:>                                                      (0 + 46) / 2901]Traceback (most recent call last):
  File "/mnt/output/regression1/UK10K.py", line 17, in <module>
    rows.write("/mnt/output/HRC.vcfs/UK10K_inclusion_status.ht")
  File "<decorator-gen-694>", line 2, in write
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py", line 560, in wrapper
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/table.py", line 1163, in write
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/backend/backend.py", line 25, in interpret
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/utils/java.py", line 210, in deco
hail.utils.java.FatalError: OutOfMemoryError: Java heap space