IBD estimation error

na89 · March 25, 2019, 2:09am

Hi!

I am trying to estimate identity by descent on google cloud dataproc cluster using post-QC matrix table(multi allelic variants were split, variants with call rate > 0.95 and samples with call rate > 0.95) with:
ht = hl.identity_by_descent(mt)

and constantly getting an error:
Hail version: 0.2.11-adfb5ad12c3c
Error summary: SparkException: Job aborted due to stage failure: ShuffleMapStage 146 (map at IBD.scala:266) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failure while fetching StreamChunkId{streamId=1663420509000, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /mnt/sdb/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1553470753653_0002/blockmgr-93e41ab8-8a87-4e02-8bd4-c5ba7e30e62f/32/shuffle_23_214_0.index

Am I using IBD estimation wrong?

Thanks!
Nikita

tpoterba · March 25, 2019, 3:50pm

This looks like a shuffle problem - trying to run again on non-preemptible nodes only (no -p in cloudtools, only -w) generally solves things.

na89 · March 26, 2019, 7:06pm

Hi Tim!

Thanks for your response. Tried with no preemptible nodes in the dataproc cluster. Still got an error:
Error summary: SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 20 times, most recent failure: Lost task 1.19 in stage 6.0 (TID 2523, art-cluster-w-0.c.daly-lab.internal, executor 39): ExecutorLostFailure (executor 39 exited caused by one of the running tasks) Reason: Container marked as failed: container_1553622353535_0001_01_000041 on host: art-cluster-w-0.c.daly-lab.internal. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

Attempted to run several times, always getting this error.

Thanks!
Nikita

tpoterba · March 30, 2019, 7:21pm

Sorry Nikita, lost track of this! Exit code 137 is usually an out-of-memory error.

How many samples do you have?

Topic		Replies	Views
Assertion failed Hail Query & hailctl	8	1399	July 8, 2019
shuffle.FetchFailedException error Hail Query & hailctl	5	454	July 29, 2021
int32ANDo error Hail Query & hailctl	1	374	November 23, 2020
Ibd() export error Hail Query & hailctl	1	392	August 13, 2022
UK Biobank RAP - Generating Summary File from Variant QC Hail Query & hailctl	3	181	April 26, 2024

IBD estimation error

Related topics