Google dataproc machine type choose for huge dataset

Dear hail team,

I have a really huge data set. (import vcf to vds and write to google bucket).

I noticed in example doc, you recommend n1-highmem-8 (32,96…) machine type.
I find it still report memory/GC issue while I use n1-highmem-96 machine and added off-heap memory.
With increasing of machine mem and off-heap mem (both driver and executor), it do finished more task. but still fail at last stage or stuck at certain task at final stage.
Fail happened during writing data.

I want to know for extremely big data set, will you guys recommend change to m1-ultramem-40 (m1-ultramem-80) or should I added bigger worker local disk. I notice last stage, via Dataproc log, shuffle occurred. Also I hear some suggestion about add more workers.

and for GC, Xss is 4M now, maybe I also need to adjust it?
Thanks a lot for your time and appreciate any suggestions

Which doc is this?

We recommend using Google Dataproc with a larger number of small (8-core) virtual machines in a cluster. Not all hardware resources scale with the number of cores in a single machine (in particular, network bandwidth), so your overall performance will be improved by using a large number of small machines compared to a small number of large machines.

Which issue is this? What version of Hail are you using, and what’s the pipeline?