Error of hl.king() run for kinship analysis

jinasong · March 2, 2021, 7:35pm

Hello,

I am trying to get kinship information for 590 WGS samples. The number of variants is over 38M.

kinship = hl.king(mt.GT)

With the script as above, I got this error message as below.

FatalError: HailException: Cannot create BlockMatrix: filtered entry at row 8491008 and col 406

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2073 in stage 14.0 failed 20 times, most recent failure: Lost task 2073.19 in stage 14.0 (TID 426052, js-c19-sw-mv7r.c.gbsc-gcp-project-mvp.internal, executor 1570): is.hail.utils.HailException: Cannot create BlockMatrix: filtered entry at row 8491008 and col 406

Please let me know how to interpret this error message. Thank you.

Best.
Jina

This log file includes a full error message.
king_20210302.log (13.7 KB)

danking · March 2, 2021, 8:49pm

Hi @jinasong,

This means your matrix table has filtered entries. hl.king currently doesn’t handle filter entries. I’ll fix that. In the mean time just add mt = mt.unfilter_entries() before you call hl.king.

danking · March 2, 2021, 8:51pm

This PR should fix that: [query] teach king to treat filtered entries as missing by danking · Pull Request #10134 · hail-is/hail · GitHub

jinasong · March 2, 2021, 8:59pm

Hi @danking,

Thank you so much for your efforts. I will retry it.

-Jina

jinasong · March 3, 2021, 5:11pm

Hi @danking

After adding mt.unfilter_entries(), the error message was gone. Thanks again.
By the way, this function has been running continuously for more than 16 hours. This work is on Dataproc on GCP with autoscaling which can use up to 4000 cores.

Can we consider this to be a normal process for king()? I just want to know if I should wait for a long time or quit this work and try to solve the issue.

Thank you.
Jina

danking · March 3, 2021, 5:59pm

Are you using all 38 million variants? You probably do not need all 38 million variants. Most folks that I know use about five to ten thousand common variants. Each rare variant contributes very little information to relatedness.

KING, like most relatedness methods, scales like N_SAMPLES^2 * N_VARIANTS, so using all 38 million variants is very time consuming.

jinasong · March 3, 2021, 6:07pm

Hi @danking

Got it. Do you have a specific way to select essential common variants for the kinship analysis in Hail? Thank you for supporting this work continuously.

Best,
Jina

danking · March 3, 2021, 6:15pm

That’s a biology question and I’m not really a biologist. Using only those variants with at least 5% or at least 1% minor allele frequency seems reasonable to me.

jinasong · March 3, 2021, 6:25pm

I am not a biologist either. Your comment is very helpful enough for me to get directions. Thanks a lot!!

-Jina

Topic		Replies	Views
Spark executor heartbeat timeout during hl.king() Hail Query & hailctl	1	1013	October 19, 2022
HailException: Cannot create BlockMatrix: Hail Query & hailctl	2	393	February 19, 2020
Connection Errors Hail Query & hailctl	8	716	August 2, 2023
Stuck at writing Hail Table Hail Query & hailctl	11	503	February 6, 2023
Error when running count after filtering MT Hail Query & hailctl	21	1427	October 8, 2019

Error of hl.king() run for kinship analysis

Related topics