LD pruning and IBD

How many total cores did the 3 node cluster have? Hail unfortunately is only moderately fast on a single computer. Hail is valuable because it can scale up to hundreds or thousands of cores. Most of our users use Google Dataproc or Amazon EMR to briefly access very large clusters. In the cloud, we pay per core-hour, so 1000 cores for one hour costs the same as 10 cores for 100 hours. If you’d like to try Hail on the cloud, we have introductory material in the docs.

Can you link to the post that sets block_size to 75? I’d like to fix that post.

The different executables do not affect performance but the affect how you set parameters. What is the output of:

echo $PYSPARK_SUBMIT_ARGS

This variable needs to specify how much memory is available to Hail. See this post for information on how to set that variable. Try setting both the executor memory and driver memory to the total amount of RAM on your computer.