What makes hail go fast locally?

Michel_Nivard · November 2, 2020, 9:43pm

Hi,

I want to make Hail go fast(er) locally when doing exome-seq analysis (cleaning, annotating, PCA etc)?

Say from a base of:
10 core 20 thread
128GB ram
and all data on SSD?

Where should I spend more money: more ram, higher speed cores, more cores or faster SSD?

Best,
Michel

tpoterba · November 2, 2020, 10:10pm

Hi Michel! Good to hear from you!

This is a good question. We don’t have a ton of experience running Hail/Spark on large single instances, but I think as long as you’ve got at least ~4-6G memory per hyperthreaded core, you’ll be fine from a memory standpoint. A faster SSD won’t help much with a small core count, but disk bandwidth could become limiting if you’ve many 10s of cores reading from disk all at once. I don’t have the information to recommend spending on cores vs disk, but I think with a decent SSD you should be able to saturate Hail running on at least 32 and probably 64 threads for most tasks.

P.S. we may be interested in talking with you for advice on getting S.E.M. into Hail in the future!

Topic		Replies	Views
Running hail locally - number of cores Hail Query & hailctl	3	823	March 28, 2023
Recommended data node hardware for Hail Help [0.1]	1	864	October 8, 2018
Hardware requirements Hail Query & hailctl	5	426	October 25, 2020
Questions about optimizing Hail and Spark configs and estimating resources and runtimes Hail Query & hailctl	3	1158	December 1, 2022
Hail uses almost all of CPUs Hail Query & hailctl	2	716	June 17, 2019

What makes hail go fast locally?

Related topics