Has there been a change with the from_pandas() method?
This part of my pipeline was working until last weekend, but now it gives me the Method Code too large error (I’ll remind you that I have to use pandas to invert my original data, since the original data has samples in rows):
)
)
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75. instruction count: 3
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count: 36158
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count > 8000
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count: 9
2018-05-11 05:22:02 root: ERROR: RuntimeException: Method code too large!
From java.lang.RuntimeException: Method code too large!
at is.hail.relocated.org.objectweb.asm.MethodWriter.a(Unknown Source)
at is.hail.relocated.org.objectweb.asm.ClassWriter.toByteArray(Unknown Source)
at is.hail.asm4s.FunctionBuilder.classAsBytes(FunctionBuilder.scala:256)
at is.hail.asm4s.FunctionBuilder.result(FunctionBuilder.scala:288)
at is.hail.expr.CM.runWithDelayedValues(CM.scala:80)
at is.hail.expr.Parser$.is$hail$expr$Parser$$evalNoTypeCheck(Parser.scala:60)
at is.hail.expr.Parser$.eval(Parser.scala:73)
at is.hail.expr.Parser$.parseExpr(Parser.scala:88)
at is.hail.table.Table.select(Table.scala:606)
at is.hail.table.Table.select(Table.scala:594)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
The “method code too large” errors come from the new, faster code path we’re working on. We haven’t touched to_pandas, but we’ve been working to make all the internal transformations take the fast path, so seeing this error pop up doesn’t surprise me too much.
I tried writing (or running show()) expr_tp_ht after each line of this code block, and it seems like the ‘Method Code too Large’ error occurs after the last line. I’m pretty sure the syntax is correct - actually, this error is triggered in only a subset of my runs, which are different tissues, which I find very bizarre (for example, 6 out of 15 tissues trigger this error while the other 9 are running fine). These 6 don’t necessarily have the largest number of rows/columns either. I also tried repartition() to have more partitions, and also tried running annotate() instead of trasmute(), but to no avail.
Sorry for the delay in my response. Can you share the output of expr_tp_ht.describe() and gene_anno_ht.describe() or more relevantly, how many row fields does each table have?
To get you unblocked, I would recommend dropping any fields that are unnecessary for your downstream computations. I realize this is not an ideal solution.
We have a long term plan (stop using the JVM, which enforces these method size limits) and a medium term plan (be smarter about the generated code so it isn’t too big). I’m exploring some short term fixes, I’ll get back to you later today if I have any ideas.
One more thing, I suspect the issue is actually in the definition of gene_anno_ht not in the code you’ve posted. Can you share the definition of gene_anno_ht?