Hail 0.2 - Attaching MatrixTable with phenotypes and getting an error


I’m almost done preparing my MatrixTable of genotypes for linear regression, and I’m trying to attach ~1700 genes expression value (genes in chromosome 1, for example), using the ** trick:

chrom_expr_ht = hl.Table.from_pandas(expr_df).key_by(‘ID’)
analysis_set = analysis_set.annotate_cols(**chrom_expr_ht[analysis_set.s])

But when I run this, I get the following error:

FatalError: RuntimeException: Method code too large!

Java stack trace:
java.lang.RuntimeException: Method code too large!

When I try the same style of annotation with a smaller gene set (~200) or my covariate set (~50), it seems to work fine. Under the hood, is there an explicit limit to the number of phenotypes (or the total length of their names)? Is there any way I can avoid this error and still make annotate_cols work?



What version are you using? Cotton thinks Amanda fixed this in the last few days.

I’d also recommend not doing the ** trick here, it expands into a huge piece of code that’s probably not necessary at this stage.

ah, another question – is this coming from Spark or Hail? What’s the full stack trace?

The reason not to do ** is that it expands out into a HUGE list of keyword arguments in Python. This is useful in many cases, but when dealing with tremendously large schemas like yours, it’s going to be very inefficient. It’s much harder for our compiler to work with.

Here is the full stack trace:

FatalError: RuntimeException: Method code too large!

Java stack trace:
java.lang.RuntimeException: Method code too large!
at is.hail.relocated.org.objectweb.asm.MethodWriter.a(Unknown Source)
at is.hail.relocated.org.objectweb.asm.ClassWriter.toByteArray(Unknown Source)
at is.hail.asm4s.FunctionBuilder.classAsBytes(FunctionBuilder.scala:293)
at is.hail.asm4s.FunctionBuilder.result(FunctionBuilder.scala:325)
at is.hail.expr.CM.runWithDelayedValues(CM.scala:80)
at is.hail.expr.Parser$.is$hail$expr$Parser$$evalNoTypeCheck(Parser.scala:60)
at is.hail.expr.Parser$.eval(Parser.scala:73)
at is.hail.expr.Parser$.parseExpr(Parser.scala:88)
at is.hail.variant.MatrixTable.selectCols(MatrixTable.scala:989)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Hail version: devel-907a817
Error summary: RuntimeException: Method code too large!

I tried a couple of values for the newest version of hail as well - looks like 1000 is too long and 800 is fine, but I haven’t tested further.

I also tried using select() with the list of genes, but it gave the same error. What would be the alternative here?

can you try this?

analysis_set = analysis_set.annotate_cols(
    gene_data = chrom_expr_ht[analysis_set.s])

This should generate much smaller code, though I’m still not sure why it’s not working.

Seems like this method works - I see that the rows are incorporated into a struct with the name gene_data, and I need to modify the linear_regression code as such:

hl.linear_regression([analysis_set.gene_data[g] for g in chrom_gene_list], analysis_set.AC)

Thank you! I think I understand how the syntax works a little better now.