Help with "FatalError: MethodTooLargeException" and tables or matrices with many columns

Hey there,

Getting an error while trying to annotate and then write out a hail table (see below). Note and follow up question… Im working on a table with a lot of columns (~2,000 phenotype columns). In general I’ve noticed that hail doesn’t handle tables or matrices with large number of columns as well as it does large numbers of rows. Do you have any tips for these types of analyses in general?

Error and stack trace below:

FatalError: MethodTooLargeException: Method too large: __C27172Compiled.apply (Lis/hail/annotations/Region;JJ)J
Java stack trace:
is.hail.relocated.org.objectweb.asm.MethodTooLargeException: Method too large: __C27172Compiled.apply (Lis/hail/annotations/Region;JJ)J
	at is.hail.relocated.org.objectweb.asm.MethodWriter.computeMethodInfoSize(MethodWriter.java:2087)
	at is.hail.relocated.org.objectweb.asm.ClassWriter.toByteArray(ClassWriter.java:489)
	at is.hail.lir.Emit$.asBytes(Emit.scala:26)
	at is.hail.lir.Emit$.apply(Emit.scala:219)
	at is.hail.lir.Classx$$anonfun$asBytes$7.apply(X.scala:127)
	at is.hail.lir.Classx$$anonfun$asBytes$7.apply(X.scala:126)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at is.hail.lir.Classx.asBytes(X.scala:132)
	at is.hail.asm4s.ClassBuilder.classBytes(ClassBuilder.scala:335)
	at is.hail.asm4s.ModuleBuilder$$anonfun$classesBytes$1.apply(ClassBuilder.scala:131)
	at is.hail.asm4s.ModuleBuilder$$anonfun$classesBytes$1.apply(ClassBuilder.scala:131)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at is.hail.asm4s.ModuleBuilder.classesBytes(ClassBuilder.scala:132)
	at is.hail.expr.ir.EmitClassBuilder.resultWithIndex(EmitClassBuilder.scala:663)
	at is.hail.expr.ir.WrappedEmitClassBuilder$class.resultWithIndex(EmitClassBuilder.scala:190)
	at is.hail.expr.ir.EmitFunctionBuilder.resultWithIndex(EmitClassBuilder.scala:1127)
	at is.hail.expr.ir.Compile$.apply(Compile.scala:73)
	at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:1481)
	at is.hail.expr.ir.Interpret$.run(Interpret.scala:819)
	at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:69)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:69)
	at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:14)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:12)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
	at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:318)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:305)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:304)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:20)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:18)
	at is.hail.utils.package$.using(package.scala:609)
	at is.hail.annotations.Region$.scoped(Region.scala:18)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:18)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:230)
	at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:304)
	at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:324)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Hail version: 0.2.54-8526838bf99f
Error summary: MethodTooLargeException: Method too large: __C27172Compiled.apply (Lis/hail/annotations/Region;JJ)J

Hey @ksbarrett ,

If you have 2,000 columns of homogenous type (which it sounds like you do), I strongly recommend using a MatrixTable instead of a Table. We use MatrixTables to analyze datasets with 100s of thousands of columns.

If you already have a table with homogenous “columns” (in Hail, we traditionally call the columns of a Table the “fields”), you can convert to a MatrixTable this way:

import hail as hl

t = hl.read_table(...)
t = t.to_matrix_table_row_major(columns=['pheno1', 'pheno2',...],
                                entry_field_name='x',
                                col_field_name='phenotype')

There’s a more complete example here.

I’ll also chime in that in some cases the MethodTooLargeException has been fixed by changes between your version (0.2.54) and 0.2.61 (latest as of today). In some cases, you’ll see a ClassTooLargeException though.

Thanks. Was considering updating the version. I’ve seen the ClassTooLargeException as well. Would you interpret that error message differently?

No, the root cause is the same (too much code being generated for the JVM limits). The best strategy is definitely to avoid thousands of table fields, and instead use a matrix table.

Thanks. My ultimate goal here is to annotate the cols a matrix table of samples with genetic data with these phenotypes, but I’ve been running into memory errors annotating and then manipulating and ultimately writing out the matrix table. That’s why I turned to trying to annotating the table first… any thoughts on improving this process?

As you’ve discovered, the column data on a MatrixTable is not distributed. We have plans to be able scale up to arbitrary amounts of column data, but that’s not ready yet.

If you need to do preprocessing on the phenotypes, I recommend doing that on an MT of the phenotypes or locally with Pandas or Numpy. Once you’ve got case/control status or numbers for each phenotype, I’d convert the quantitative and case-control phenotypes into arrays:

quant_pheno_mt = quant_pheno_mt.annotate_rows(
    quant_phenos = hl.agg.collect(mt.pheno_value))
case_control_pheno_mt = case_control_pheno_mt.annotate_rows(
    case_control_phenos = hl.agg.collect(mt.pheno_value))

and annotate them on the genotype MT:

mt = mt.annotate_cols(
    quant_phenos=quant_pheno_mt[mt.s].quant_phenos,
    case_control_phenos=case_control_pheno_mt[mt.s].case_control_phenos)

You can pass linear_regression_rows an array of phenotypes:

gwas_results = hl.linear_regression_rows(
    y=mt.quant_phenos,
    x=mt.GT.n_alt_alleles(),
    covariates=[1.0, mt.PC0, ...])

I do not recommend adding a new column field for every phenotype. This is the natural thing to do, but each field incurs overhead. Treating the columns instead as an array allows Hail to use a more efficient representation.


I’m sorry you’re running into this! It’s definitely a rough part of Hail that we’re working hard to fix. Congratulations on being on the bleeding edge of Hail :wink:.

1 Like

Thanks! A helpful workaround for now and good to know updates in this area are coming.

1 Like