IBD estimation error

Hi!

I am trying to estimate identity by descent on google cloud dataproc cluster using post-QC matrix table(multi allelic variants were split, variants with call rate > 0.95 and samples with call rate > 0.95) with:
ht = hl.identity_by_descent(mt)

and constantly getting an error:
Hail version: 0.2.11-adfb5ad12c3c
Error summary: SparkException: Job aborted due to stage failure: ShuffleMapStage 146 (map at IBD.scala:266) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failure while fetching StreamChunkId{streamId=1663420509000, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /mnt/sdb/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1553470753653_0002/blockmgr-93e41ab8-8a87-4e02-8bd4-c5ba7e30e62f/32/shuffle_23_214_0.index

Am I using IBD estimation wrong?

Thanks!
Nikita

This looks like a shuffle problem - trying to run again on non-preemptible nodes only (no -p in cloudtools, only -w) generally solves things.

Hi Tim!

Thanks for your response. Tried with no preemptible nodes in the dataproc cluster. Still got an error:
Error summary: SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 20 times, most recent failure: Lost task 1.19 in stage 6.0 (TID 2523, art-cluster-w-0.c.daly-lab.internal, executor 39): ExecutorLostFailure (executor 39 exited caused by one of the running tasks) Reason: Container marked as failed: container_1553622353535_0001_01_000041 on host: art-cluster-w-0.c.daly-lab.internal. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

Attempted to run several times, always getting this error.

Thanks!
Nikita

Sorry Nikita, lost track of this! Exit code 137 is usually an out-of-memory error.

How many samples do you have?