Thank you for your previous response to the question that I had about running annotate_cols with a table of about ~17000 rows.
This is a follow-up question to that, as well as asking about your plans to fix the “Method code too large” error.
Now that my table is annotated with:
analysis_set = analysis_set.annotate_cols(expr = chrom_expr_ht[analysis_set.s])
I want to run linear_regression on that MatrixTable, and I’m using this code:
ds_result = hl.linear_regression([analysis_set.expr[g] for g in chrom_gene_list], analysis_set.AC, [analysis_set.covs[c] for c in cov_list])
Of course, with the large gene set I’m getting the “Method code too large” error again. Is there a syntax solution to this, namely running linear_regression for all fields that are under the Struct expr, with the set of covariates under the Struct covs? I understand that the ** trick will also not work here. Also, do you have any plans to allow for larger method codes? (the regression task itself runs fine, even with ~2000 phenotypes)
The latest commit hash where I ran into this error is devel-554c2ef
It seems like in this hash version, the tolerance for the length of method code actually got shorter - I was able to run up to 2000 phenotypes at a time, but in this version even ~1000 yields the error.
I wrote an additional functionality to dynamically partition the phenotype set with increasing number of partitions each time this error occurs (since I can’t predict in advance what the ‘right’ number is), but of course now my association step takes a longer time.
Can you share the hail.log file from one of these failing runs? There is some information about generated method size and query plans in there that will help us understand the issue. You can find this file in the working directory of Spark on your executor node.