Hi @gil - Thanks for the info. I am trying a slightly different geometry but I note your idea of adding 200 gVCFs to the previous VDS incrementally with the spark_max_stage_parallelism=β20000β and a high target_records.
Thanks for the insight on the cluster. I was wondering, do you specify data nodes ? or if not what is the disk space allocated ?
On my side, my last attempt was running on a cluster of 31 CORE nodes (r6g.4xlarge) x 150Gb (~4,500Gb disk) with 30,000 max stage parallelism.
Hi everyone, im having similar problems with the VDS combiner. I have roughly 4000 gVCFs im trying to merge but because of the HPC constraints im getting βtoo many open files errorsβ. The most samples I can combine in one run is 200. Was wondering if anyone else has come across the same problem?
Im currently testing 500 samples with spark_max_stage_parallelism=β20000β, target_records=1_100_000 and 50 branch factor. If anyone has any suggestions, please let me know.