Large gVCF into VDS

Hi @gil - Thanks for the info. I am trying a slightly different geometry but I note your idea of adding 200 gVCFs to the previous VDS incrementally with the spark_max_stage_parallelism=‘20000’ and a high target_records.

Thanks for the insight on the cluster. I was wondering, do you specify data nodes ? or if not what is the disk space allocated ?

On my side, my last attempt was running on a cluster of 31 CORE nodes (r6g.4xlarge) x 150Gb (~4,500Gb disk) with 30,000 max stage parallelism.