Spark tuning for Elasticsearch export?

nicklecompteBCH · February 12, 2020, 9:12pm

Hi,

Just FYI that this is low priority and we’re not blocked by anything:

We have had some headaches exporting heavily-annotated VCFs from Hail to Elasticsearch - during the export, Yarn shuts down a container for using too much memory, with a standard ERR_SPARK_FAILED_YARN_KILLED_MEMORY error and stack trace. Following Google/StackOverflow, upping the spark.executor.memoryOverhead to something more like 15-20% of the spark.executor.memory value appears to solve the problem. I have started using 20% without any appreciable impact on the executors during the analysis tasks.

Very naively this makes some sense to me: for complex documents it’s possible to exceed the default 10% memoryOverhead that Spark reserves to other JVM processes, namely the Elasticsearch-connector. But I am not sure about any of this; all I did was Google and try out different Spark configurations. Changing the memory overhead makes me a bit nervous, so I just want to check that this is a reasonable change to the default cluster configuration for a workflow like this, and that there aren’t any obvious negative impacts I’m missing.

This is Hail 0.2, 150 cores at 27g, 4 cpu. Before the export I persisted the tables - could this be a problem? During the analysis tasks, I didn’t notice “too many” unsorted shuffle warnings (given that I am doing a lot of joins), ugly GC pauses, huge resource spikes, etc. Regardless I would have expected that to be cleaned up during persistence.

nicklecompteBCH · February 14, 2020, 12:55am

Pleased to report that setting spark.dynamicAllocation.enabled = false fixed the Elasticsearch slowness/crashing problem without me having to fiddle with other memory settings. It seems that by default AWS EMR will turn this on, and (along with who knows what other AWS oddness) it caused problems in the export. Thanks to @pavlos for the tip (Small MatrixTable hangs on write into Google bucket).

Topic		Replies	Views
Yarn memory overhead Hail Query & hailctl	4	857	August 16, 2022
Getting java heap error tried a bunch of things with the executor and memory settings Hail Batch & General Cloud	2	3378	October 5, 2022
Java Heap Space out of memory Hail Query & hailctl	5	3644	August 10, 2020
"Hail off-heap memory exceeded maximum threshold" error on large analysis job Hail Query & hailctl	1	305	April 18, 2023
Export ElasticSearch error Hail Query & hailctl	4	567	November 4, 2020

Spark tuning for Elasticsearch export?

Related topics