Large gVCF into VDS

Hi @gil - Thanks for the info. I am trying a slightly different geometry but I note your idea of adding 200 gVCFs to the previous VDS incrementally with the spark_max_stage_parallelism=β€˜20000’ and a high target_records.

Thanks for the insight on the cluster. I was wondering, do you specify data nodes ? or if not what is the disk space allocated ?

On my side, my last attempt was running on a cluster of 31 CORE nodes (r6g.4xlarge) x 150Gb (~4,500Gb disk) with 30,000 max stage parallelism.

Hi everyone, im having similar problems with the VDS combiner. I have roughly 4000 gVCFs im trying to merge but because of the HPC constraints im getting β€˜too many open files errors’. The most samples I can combine in one run is 200. Was wondering if anyone else has come across the same problem?

Im currently testing 500 samples with spark_max_stage_parallelism=β€˜20000’, target_records=1_100_000 and 50 branch factor. If anyone has any suggestions, please let me know.

Thank you!