Speeding up gnomAD annotation

danking · December 1, 2020, 1:06am

I’m sorry you’re having trouble with Hail!

Regarding speed, Hail is lazy, so it only executes your pipeline when you observe the output. For example, write, show, and collect All the dataset annotation is done by the write step.

Hail will already read as little of the gnomad data as possible, filtering based on another table won’t improve upon what Hail is doing. I’m not sure why that causes you to run out of memory. Hopefully someone from the compiler team can comment on that.

Regarding running this faster, what is your compute environment? It appears that you’re using a laptop. Have you tried setting PYSPARK_SUBMIT_ARGS to use all the memory on your laptop?

Topic		Replies	Views
Table.annotate takes a while Hail Query & hailctl	6	402	March 15, 2021
Slow speed when using gnomadV3 callset Hail Query & hailctl	0	110	May 8, 2024
Help for annotating a matrixtable variant data in DNAnexus with gnomAD database Hail Query & hailctl	11	486	February 9, 2023
Table.export issue Hail Query & hailctl	3	518	September 8, 2020
How to speed up export_vcf? Hail Query & hailctl	6	490	September 24, 2021

Speeding up gnomAD annotation

Related topics