Error summary: OutOfMemoryError: Java heap space

OK. A few things!

  1. When using Hail on a single, large server, you need to explicitly tell Apache Spark how much memory is available. See details here: How do I increase the memory or RAM available to the JVM when I start Hail through Python? - #2 by danking. In particular, you might try starting Jupyter this way:
PYSPARK_SUBMIT_ARGS="--driver-memory 460g --executor-memory 460g pyspark-shell" jupyter notebook
  1. When running PCA, you definitely do not need 24M variants. Assuming that you are using PCA to interrogate the ancestry of your samples, common variants are sufficient. I suggest something like this:
EUR_for_pca = EUR_mt_full
EUR_for_pca = hl.variant_qc(EUR_for_pca)
# filter to variants with minor allele frequency >5%
EUR_for_pca = EUR_for_pca.filter_rows(
    (EUR_for_pca.variant_qc.AF[0] > 0.05) & (EUR_for_pca.variant_qc.AF[0] < 0.95)
)
n_common_variants = EUR_for_pca.count_rows()
# keep a random ~10k subset of common variants 
EUR_for_pca = EUR_for_pca.sample_variants(10_000 / n_common_rows)
# save the set of variants for later use
EUR_for_pca.rows().write('Haill_mt/variants_for_pca.ht')
EUR_pca_variants = hl.read_table('Haill_mt/variants_for_pca.ht')
# filter the matrix table to just the PCA variants
EUR_for_pca = EUR_mt_full.semi_join_rows(EUR_pca_variants)
EUR_eigenvalues, EUR_pcs, _ = hl.hwe_normalized_pca(EUR_for_pca.GT)