Hello
I’m running a fairly straightforward operation. I read in a WGS pvcf ~150mil variants with ~2k individuals. I split the file into biallelic, run variant_qc, filter on aaf and write out the matrix table.
While writing out the matrix table Hail hangs without giving me any error message and CPU usage goes down to zero. Looking into the spark logs some of the jobs fail with the error:
‘ExecutorLostFailure (executor 9 exited unrelated to the running tasks) Reason: Container marked as failed’
The only other output hail gives is the message ‘Hail: INFO: Ordering unsorted dataset with shuffle’ while trying to output the matrix table.
I am running this on GCP with the following configurations:
–master-machine-type n1-standard-8
–worker-machine-type n1-highmem-8 (20 non-preemptible nodes, have tried different types of VMs)
–properties spark:spark.driver.maxResultSize=8g,spark:spark.executor.memory=4g
Hail version: 0.2.89-38264124ad91
I’ve run larger WGS files before so am not sure why this error is coming up.
Thank you very much!