Using --fork on VEP?

nicklecompteBCH · February 11, 2020, 1:48am

Hello,

I am using Hail 0.2 to run VEP, and have added "--fork", "6", to my vep config .json. I have not done a lot of testing with VEP by itself but my understanding is that the fork option is generally recommended on modern processors. I could not even tell you if running VEP through Hail is actually faster with --fork - it seems like it is? It doesn’t crash too often, at least But I have not had a lot of time/resources for performance tuning.

I am wondering if anyone here has experience doing this. It does not seem to be used very widely in the Hail community (based on the other VEP configs I’ve seen) but I haven’t checked extensively. Unfortunately I am not being very careful about executor CPU count vs. VEP CPU count - is this something I should be more concerned about?

It is also a bit opaque to me how this affects memory considerations. I am using yarn, and have had to twiddle quite a few settings to get --fork 6 to reliably work on larger datasets (oddly --fork 6 works reliably out-of-the-box on a single-sample VCF with a small cluster). Specifically I had to manually override the spark.executor.memoryOverhead default of 10% of executor memory so that yarn doesn’t shut down the container during VEP execution. I think VEP (considered by the JVM to be native) runs exclusively in that memoryOverhead, so if VEP is being multithreaded within a container then increasing the limit makes sense. 25% seems to work but maybe that’s too much.

Anyway, is there any general guidance about this?

tpoterba · February 11, 2020, 1:41pm

I would expect this to slightly degrade performance, actually – Hail runs VEP in parallel already, launching VEP processes on each worker CPU. Since it’s already parallelized (and all CPUs are already busy), there’s little benefit to using additional threads.

tpoterba · February 11, 2020, 1:42pm

Also, I hear you about the memory management pain. It’s not easy to configure this stuff.

nicklecompteBCH · February 11, 2020, 2:18pm

Thanks! That makes sense - I thought maybe Hail was running one VEP process per executor but good to know that’s not the case. I will leave the option off by default.

Topic		Replies	Views
VEP Annotation stalling Hail Query & hailctl	0	29	May 9, 2025
Is it possible to run Hail vep on a local cluster? Hail Query & hailctl	0	37	October 27, 2024
Memory issue in Hail Help [0.1]	12	1499	September 20, 2017
Questions about optimizing Hail and Spark configs and estimating resources and runtimes Hail Query & hailctl	3	1149	December 1, 2022
VEP memoization caching Hail Query & hailctl	2	478	March 27, 2019

Using --fork on VEP?

Related topics