Export_vcf OutOfMemoryError: Java heap space despite --driver-memory 8g

Table.select (And MT.select_rows / select_cols) keep key fields implicitly - so rows = ds.rows().select('rsid') preserves the locus/alleles key.

What I want to do here is do the expensive and temperamental shuffle-join once with a small dataset, save it, and use that with the bgen.

OK, I see, but how do I then use it with the bgen?

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

then you can read it in and join on key:

mt = hl.import_bgen(..._
inclusion_ht = hl.read_table('inclusion.ht')
mt = mt.filter_rows(~inclusion_ht[mt.row_key].inclusion)

OK… I wasn’t sure if using filter_rows was still the way to go. Thank you, I’ll try that and let you know how it goes.

Unfortunately, it seems the shuffling issue persists even on the smaller matrix, it crashes when it’s executed (i.e. when saving the matrix):

variant_exclusion_table = hl.import_table(“/mnt/output/HRC.vcfs/HRC_variants.tsv.bgz”, no_header=True, key=‘f0’)
variant_exclusion_table = variant_exclusion_table.key_by(‘f0’)
rows = ds.rows().select(‘rsid’)
rows = rows.annotate(included = ~hl.is_defined(variant_exclusion_table[rows.rsid]))
rows.write(“mnt/output/HRC.vcfs/UK10K_inclusion_status.ht”)

[Stage 6:>                                                      (0 + 46) / 2901]Traceback (most recent call last):
  File "/mnt/output/regression1/UK10K.py", line 17, in <module>
    rows.write("/mnt/output/HRC.vcfs/UK10K_inclusion_status.ht")
  File "<decorator-gen-694>", line 2, in write
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py", line 560, in wrapper
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/table.py", line 1163, in write
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/backend/backend.py", line 25, in interpret
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/local/hail/build/distributions/hail-python.zip/hail/utils/java.py", line 210, in deco
hail.utils.java.FatalError: OutOfMemoryError: Java heap space