Hello
I’m running LD-prune step on a (344031, 37702) matrix, on a cluster with 10 n1-highmem-8 worker nodes. I get an error: Executor heartbeat timed out after 137902 ms.
The following parameters are set in my spark-defaults conf file:
spark.executor.memory=10117m
spark.executor.memoryOverhead=15175m
I was able to run a smaller dataset (~90k, 20k) individuals, which took about an hour to run. Although the command succeeded in the spark UI I could see jobs failing with the same error message
What would I need to do here?
Thank you very much