Linear regression hanging - help needed

dlcotter · June 28, 2023, 11:49pm

Hi Dan,

Thanks for keeping the suggestions coming. There’s already a line removing rows with NAs as the gene symbol. This amounts to 3,463/10,441 = 33% of the rows in the HGDP table:

burden = burden.filter_rows( hl.is_missing(burden.gene_symbol), keep=False )

However, I moved the filtering up to the step where I import the HGDP table itself, since until now it has been done after the grouping and aggregation steps. If the NA rows are going to be removed anyway, they may as well be removed prior to a costly grouping & aggregation step, right?

The print(burden.n_partitions()) line prints 1, which supports your statement that it is collapsing to a single partition from multiple. The n_partitions() call runs for several minutes, printing the same status message as the linear regression did in the previous run (last line below):

[Stage 16:===================================================>(2585 + 1) / 2586]

Another thing I tried was to add a call to <table>.show() after every table modification to isolate the cause of the hang. (The only problem with this is that I may be introducing inefficiency by forcing Spark to do each stage in the order written, rather than having some license to rearrange for better efficiency.)

Anyway: The step where it is hanging seems to be:

burden = ( genomes.group_rows_by(genomes.vep_info.gene_symbol).aggregate(n_variants = hl.agg.count_where(genomes.GT.n_alt_alleles() > 0)))

Do you think it would help if I broke this multistep operation into several discrete steps?

Thanks,
Daniel

Topic		Replies	Views
GWAS hanging up on runJob stage Hail Query & hailctl	3	442	June 25, 2020
Hail stuck when running array data Hail Query & hailctl	4	302	October 9, 2023
Ld score regression Hail Query & hailctl	1	211	December 12, 2023
Hail 0.2 - checking in about the error "Method code too large" Hail Query & hailctl	4	577	May 4, 2018
Final task/partition is hanging Hail Query & hailctl	4	545	September 26, 2019

Linear regression hanging - help needed

Related topics