I have a workflow that essentially annotates a large vcf with gnomAD popmax values, filters by a threshold and exports back to vcf. The process so far has been prohibitively slow (about 15 minutes for just chr21, and about 1.5 hours for chr1).
For my latest run, I’m attempting to do all the chromosomes at once. I ran the script late last night, and it’s still running now, roughly 10 hours later.
Can anyone suggest ways to speed up my pipeline? (I’ll include the code in a separate post to avoid the spam filter).
-Thanks in advance