VEP memoization caching

knguyen142 · March 27, 2019, 3:55pm

Our pipeline has a VEP step that takes quite a bit of time because of the computations. We also have a pretty standard set of variants we VEP. We were wondering if it’s possible to memoize these annotations and use those first (i.e. pre-memoize the common ones to a cache, and only VEP on ones not in the cache and update cache accordingly).

Please let us know if this is feasible and recommended. If so, what is the best way to store the cache?

HT: It seems like when we update the cache, we’d need to rewrite everything?
AnnotationsDB: Is something I’ve heard thrown around, but is not available in v02 (yet)?
An append-only file with variant to vep struct mapping, read in as a hail table?

Thanks!

tpoterba · March 27, 2019, 7:05pm

We at one point had exactly this design – store the VEPed whole genome SNPs (9B) as a table, join that, and run VEP on the rest. But we stopped doing that when VEP got faster. Maybe it’s gotten slower again in recent versions?

knguyen142 · March 27, 2019, 7:20pm

Thanks Tim. We haven’t tried the new version of VEP yet, will check it out and see if it’s still a problem. Good to know that the table idea is a good candidate solution.

Topic		Replies	Views
Pre-VEPed reference table Hail Query & hailctl	0	52	June 18, 2024
VEP Annotation stalling Hail Query & hailctl	0	29	May 9, 2025
Sample wise VEP annotation for Rare Variant Disease(Exome) vcf files through hail with my custom databases Hail Query & hailctl	9	362	June 12, 2023
VEP annotation taking forever with nodes sitting idle in DNAnexus Hail Query & hailctl	0	564	June 2, 2022
Import existing VEP annotations from vcf or CSQ Hail Query & hailctl	10	1517	November 27, 2019

VEP memoization caching

Related topics