LD pruning and IBD

danking · March 29, 2021, 4:21pm

How many total cores did the 3 node cluster have? Hail unfortunately is only moderately fast on a single computer. Hail is valuable because it can scale up to hundreds or thousands of cores. Most of our users use Google Dataproc or Amazon EMR to briefly access very large clusters. In the cloud, we pay per core-hour, so 1000 cores for one hour costs the same as 10 cores for 100 hours. If you’d like to try Hail on the cloud, we have introductory material in the docs.

Can you link to the post that sets block_size to 75? I’d like to fix that post.

The different executables do not affect performance but the affect how you set parameters. What is the output of:

echo $PYSPARK_SUBMIT_ARGS

This variable needs to specify how much memory is available to Hail. See this post for information on how to set that variable. Try setting both the executor memory and driver memory to the total amount of RAM on your computer.

Topic		Replies	Views
LD pruning not finishing running Hail Query & hailctl	1	381	April 28, 2022
Ld_prune() returns SparkException Hail Query & hailctl	16	744	December 11, 2018
Ld_prune() out of memory Hail Query & hailctl	9	493	March 14, 2022
Ld_prune starts and stops error Hail Query & hailctl	1	666	May 30, 2019
LD pruning repeated errors Hail Query & hailctl	16	544	December 20, 2020

LD pruning and IBD

Related topics