Linear_regression with multiple phenotypes in Hail 0.2

Hi,

I have a problem when using linear_regression with multiple phenotypes in Hail 0.2. Please see below.

Basically I’m trying to run linear regressions with 501 phenotypes. Then I got an error message “Method code too large”

Any suggestions would be appreciated. Thank you!

A better alternative to eval:

ys = [mt2_g_cis['y']] + [mt2_G_cis[f'p{i+1}'] for i in range(nPermutations)]

What’s the full error message here? The error is likely from something earlier in the pipeline.

Thanks for the quick reply! Please see below for the full error message:

Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
File “/tmp/kernel-PySpark-c73f7ddf-473e-4623-8cda-b17160463cf5/pyspark_runner.py”, line 194, in
eval(compiled_code)
File “”, line 6, in
File “/mnt/tmp/spark-37a03cc9-7ba1-4c06-bdde-fad17c5f603e/userFiles-b93fc739-9084-4b7a-93e5-95db1180b7b0/hail-python.zip/hail/typecheck/check.py”, line 547, in wrapper
return f(*args_, **kwargs_)
File “/mnt/tmp/spark-37a03cc9-7ba1-4c06-bdde-fad17c5f603e/userFiles-b93fc739-9084-4b7a-93e5-95db1180b7b0/hail-python.zip/hail/methods/statgen.py”, line 361, in linear_regression
block_size)
File “/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
answer, self.gateway_client, self.target_id, self.name)
File “/mnt/tmp/spark-37a03cc9-7ba1-4c06-bdde-fad17c5f603e/userFiles-b93fc739-9084-4b7a-93e5-95db1180b7b0/hail-python.zip/hail/utils/java.py”, line 196, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: RuntimeException: Method code too large!

Java stack trace:
java.lang.RuntimeException: Method code too large!
at is.hail.relocated.org.objectweb.asm.MethodWriter.a(Unknown Source)
at is.hail.relocated.org.objectweb.asm.ClassWriter.toByteArray(Unknown Source)
at is.hail.asm4s.FunctionBuilder.classAsBytes(FunctionBuilder.scala:306)
at is.hail.expr.ir.EmitFunctionBuilder.result(EmitFunctionBuilder.scala:284)
at is.hail.expr.ir.Compile$.apply(Compile.scala:50)
at is.hail.expr.ir.Compile$.apply(Compile.scala:31)
at is.hail.expr.ir.CompileWithAggregators$$anonfun$4.apply(Compile.scala:170)
at is.hail.expr.ir.CompileWithAggregators$$anonfun$4.apply(Compile.scala:170)
at is.hail.expr.ir.MatrixMapCols.execute(MatrixIR.scala:1231)
at is.hail.expr.ir.MatrixMapCols.execute(MatrixIR.scala:1143)
at is.hail.expr.ir.MatrixMapEntries.execute(MatrixIR.scala:958)
at is.hail.variant.MatrixTable.value$lzycompute(MatrixTable.scala:509)
at is.hail.variant.MatrixTable.value(MatrixTable.scala:504)
at is.hail.variant.MatrixTable.x$13$lzycompute(MatrixTable.scala:514)
at is.hail.variant.MatrixTable.x$13(MatrixTable.scala:514)
at is.hail.variant.MatrixTable.colValues$lzycompute(MatrixTable.scala:514)
at is.hail.variant.MatrixTable.colValues(MatrixTable.scala:514)
at is.hail.stats.RegressionUtils$.getColumnVariables(RegressionUtils.scala:64)
at is.hail.stats.RegressionUtils$.getPhenosCovCompleteSamples(RegressionUtils.scala:93)
at is.hail.methods.LinearRegression$.apply(LinearRegression.scala:26)
at is.hail.methods.LinearRegression.apply(LinearRegression.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Hail version: devel-ae9e34fb3cbf
Error summary: RuntimeException: Method code too large!

StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
scala.Option.foreach(Option.scala:257)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:162)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:280)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)

What’s the full python script? This is coming from an annotate_cols or select_cols.

I would also recommend strongly that you update to the latest version – yours is ~6 weeks old

The code is as follows. Basically I annotated the mt by 500 permutations and run linear regressions. Please see below:

Annotate mt by real and permutated gene expression (residuals of 1 PEER factor, Age, HTN, and disease)

table = (hl.import_table(‘s3://gfb-genomics/gene2permsPilot_G/{}’.format(gene), impute=True, missing=’’)
.key_by(‘Sample_name’))
mt2_G_cis = mt2_G_cis.annotate_cols(**table[mt2_G_cis.s])

nPermutations = 500

Run linreg_multi_pheno,

ys = [eval(‘mt2_G_cis.y’)] + [eval(‘mt2_G_cis.p{}’.format(i+1)) for i in range(nPermutations)]

controlling for Gender and 4 Genotyping PCs

mt2_G_cis_eQTLs = hl.linear_regression(y=ys,
x=mt2_G_cis.GT.n_alt_alleles(),
covariates=[1.0, mt2_G_cis.isFemale,
mt2_G_cis.PC1, mt2_G_cis.PC2, mt2_G_cis.PC3, mt2_G_cis.PC4])

what does that table look like? Does it have 20000 columns?

To start, I would recommend doing something like

mt2_G_cis = mt2_G_cis.annotate_cols(genes = table[mt2_G_cis.s])

If you evaluate dict(**table[mt2_G_cis.s]) you’ll see that this expands into something horrible.

The table has 136 rows and 501 columns. So it is not huge at all.

I’ll try your suggestions. Thank you!

If you try with the latest version and still get the error, please send us the hail.log file so we can more effectively debug. This will contain the computational graph that failed to execute