Hail 0.2 - Method code too large error in the latest build

bjo · May 11, 2018, 4:04pm

Has there been a change with the from_pandas() method?

This part of my pipeline was working until last weekend, but now it gives me the Method Code too large error (I’ll remind you that I have to use pandas to invert my original data, since the original data has samples in rows):

chrom_expr_ht = hl.Table.from_pandas(expr_df).key_by(‘ID’)

I’m attaching the entry from the log file that may be relevant (I’ve masked the field names and gene names).

2018-05-11 05:20:41 MemoryStore: INFO: Block broadcast_32 stored as values in memory (estimated size 437.7 KB, free 21.7 GB)
2018-05-11 05:20:41 MemoryStore: INFO: Block broadcast_32_piece0 stored as bytes in memory (estimated size 30.0 KB, free 21.7 GB)
2018-05-11 05:20:41 BlockManagerInfo: INFO: Added broadcast_32_piece0 in memory on 10.128.0.54:42791 (size: 30.0 KB, free: 21.7 GB)
2018-05-11 05:20:41 SparkContext: INFO: Created broadcast 32 from textFile at RichSparkContext.scala:16
2018-05-11 05:20:41 FileInputFormat: INFO: Total input files to process : 1
2018-05-11 05:20:41 root: INFO: in Table.value: execute:
(TableMapRows
(TableLiteral)
(MakeStruct
(ID
(GetField ID
(Ref row)))
(field_1
(GetField field_1
(Ref row)))
(field_2
(GetField field_2
(Ref row)))
(field_3
(GetField field_3
(Ref row)))

…

Select[ENSG00000XXXXX](
  SymRef[__uid_58]
)
Select[ENSG00000XXXXX](
  SymRef[__uid_58]
)
Select[ENSG00000XXXXX](
  SymRef[__uid_58]
)

)
)
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75. instruction count: 3
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count: 36158
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count > 8000
2018-05-11 05:22:01 root: INFO: is/hail/codegen/generated/C75.apply instruction count: 9
2018-05-11 05:22:02 root: ERROR: RuntimeException: Method code too large!
From java.lang.RuntimeException: Method code too large!
at is.hail.relocated.org.objectweb.asm.MethodWriter.a(Unknown Source)
at is.hail.relocated.org.objectweb.asm.ClassWriter.toByteArray(Unknown Source)
at is.hail.asm4s.FunctionBuilder.classAsBytes(FunctionBuilder.scala:256)
at is.hail.asm4s.FunctionBuilder.result(FunctionBuilder.scala:288)
at is.hail.expr.CM.runWithDelayedValues(CM.scala:80)
at is.hail.expr.Parser$.is$hail$expr$Parser$$evalNoTypeCheck(Parser.scala:60)
at is.hail.expr.Parser$.eval(Parser.scala:73)
at is.hail.expr.Parser$.parseExpr(Parser.scala:88)
at is.hail.table.Table.select(Table.scala:606)
at is.hail.table.Table.select(Table.scala:594)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

2018-05-11 05:22:02 SparkContext: INFO: Invoking stop() from shutdown hook
2018-05-11 05:22:02 AbstractConnector: INFO: Stopped Spark@e685744{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

tpoterba · May 11, 2018, 4:07pm

The “method code too large” errors come from the new, faster code path we’re working on. We haven’t touched to_pandas, but we’ve been working to make all the internal transformations take the fast path, so seeing this error pop up doesn’t surprise me too much.

@danking @wang do we have thoughts?

tpoterba · May 11, 2018, 4:24pm

also, I’d really recommend writing to disk and importing to Hail, instead of from_pandas. It’ll probably be way faster.

bjo · May 19, 2018, 12:07am

@danking @wang

Any plans to look into this issue? I encounter this error even without using from_pandas.

Another case where I encounter this error in the latest build (5d2bfa56e252):

expr_tp_ht = hl.import_table(expr_path + tissue + suffix, impute = True).key_by('gene_id')
expr_tp_ht = expr_tp_ht.join(gene_anno_ht.select(gene_anno_ht.mappability), how = 'left')
expr_tp_ht = expr_tp_ht.filter(expr_tp_ht.mappability >= 0.8)
expr_tp_ht = expr_tp_ht.transmute(chrom = hl.int(expr_tp_ht['#chr'][3:]))
expr_tp_ht = expr_tp_ht.transmute(TSS = expr_tp_ht.end).drop('start')

I tried writing (or running show()) expr_tp_ht after each line of this code block, and it seems like the ‘Method Code too Large’ error occurs after the last line. I’m pretty sure the syntax is correct - actually, this error is triggered in only a subset of my runs, which are different tissues, which I find very bizarre (for example, 6 out of 15 tissues trigger this error while the other 9 are running fine). These 6 don’t necessarily have the largest number of rows/columns either. I also tried repartition() to have more partitions, and also tried running annotate() instead of trasmute(), but to no avail.

For a little more information:
2018-05-18 23:29:40 root: INFO: in Table.value: execute:
(TableJoin
(TableMapRows
(TableMapRows
(TableMapRows
(TableKeyBy
(TableLiteral))
(Let __uid_4
(SelectFields
(ApplyIR annotate
(SelectFields
(Ref row))
(MakeStruct
(TSS
(GetField end
(Ref row))))))
(ApplyIR annotate
(MakeStruct
(gene_id
(GetField gene_id
(Ref row))))
(MakeStruct
(start
(GetField start
(Ref __uid_4)))
(SUBJ-1
(GetField SUBJ-1
(Ref __uid_4)))
…
(TSS
(GetField TSS
(Ref __uid_5)))))))
(Let __uid_6
(MakeStruct
(TSS
(GetField TSS
(Ref row))))
(ApplyIR annotate
(MakeStruct
(gene_id
(GetField gene_id
(Ref row))))
(MakeStruct
(TSS
(GetField TSS
(Ref __uid_6)))))))
(TableRead gs://gtex-hail/Data/Annotation/gencode_26/gencode_v26_autosomal_genes.ht))
2018-05-18 23:29:41 root: INFO: is/hail/codegen/generated/C21. instruction count: 3
2018-05-18 23:29:41 root: INFO: is/hail/codegen/generated/C21.apply instruction count: 56963
2018-05-18 23:29:41 root: INFO: is/hail/codegen/generated/C21.apply instruction count > 8000
2018-05-18 23:29:41 root: INFO: is/hail/codegen/generated/C21.apply instruction count: 22
2018-05-18 23:29:41 root: ERROR: RuntimeException: Method code too large!

danking · May 22, 2018, 1:57pm

Hi @bjo,

Sorry for the delay in my response. Can you share the output of expr_tp_ht.describe() and gene_anno_ht.describe() or more relevantly, how many row fields does each table have?

To get you unblocked, I would recommend dropping any fields that are unnecessary for your downstream computations. I realize this is not an ideal solution.

We have a long term plan (stop using the JVM, which enforces these method size limits) and a medium term plan (be smarter about the generated code so it isn’t too big). I’m exploring some short term fixes, I’ll get back to you later today if I have any ideas.

Also: Can you share the full hail.log?

danking · May 22, 2018, 9:25pm

One more thing, I suspect the issue is actually in the definition of gene_anno_ht not in the code you’ve posted. Can you share the definition of gene_anno_ht?

Topic		Replies	Views
Method code too large error Help [0.1]	7	1733	July 12, 2018
Hail 0.2 - Attaching MatrixTable with phenotypes and getting an error Hail Query & hailctl	7	556	April 20, 2018
Hail 0.2 - checking in about the error "Method code too large" Hail Query & hailctl	4	580	May 4, 2018
Error when trying to use to_pandas() on a large hail table Hail Query & hailctl	3	496	December 6, 2022
Help with "FatalError: MethodTooLargeException" and tables or matrices with many columns Hail Query & hailctl	7	981	January 22, 2021

Hail 0.2 - Method code too large error in the latest build

Related topics