Everything looks perfectly fine with your setup. I think if the VCF is ordered unexpectedly and triggering a shuffle, that could be the source of the memory overhead.
Most local Hail users use Google Dataproc, which works very well for most non-shuffle operations*. They’re running on hundreds of thousands of samples here, so a few thousand should be no problem.
*Shuffles are bad on the cloud due to OOMs and node pre-emption. We’re thinking about possible fixes.