Export_vcf very slow

As part of a Variant QC workflow, I am attempting to export my matrixtable into a VCF. However, the run time for this is taking far too long and incurring very high run costs. From what I have noticed, this exporting Stage (Stage 22 in the screenshot examples below) begins rather quickly, but then upon reaching the final several dozen partitions, each task begins to take more and more time to complete. While the initial tasks/partitions run in ~1 second, the final tasks/partitions take several hours.

For reference, this particular block of exome data was run split with min_partitions = 288 with 288 cores.

What could be explaining this stalling that is happening at the end of the Stage that is contributing to these long run times?


Some tasks run for much longer run times, though they are not necessarily the tasks that are initiated later:

Most tasks complete in a manner of seconds:

The majority of tasks are complete by the first 2 minutes, while the remaining will take several hours to complete:

These are the Spark Executor parameters, showing the remaining tasks that will take long to complete: