Error summary: OutOfMemoryError: Java heap space

I am having the memory issue using 480GB and 64cores machine

here is my command

mt = hl.import_vcf(’/mnt/shared/garvan/marpin/MGRB_phase2_SNPtier12_match_vqsr_minrep_locusannot_WGStier12_unrelated_filteredhomo_hetero.vcf.bgz’).write(’/mnt/ceph/mt-sinai/WGS_hail/MGRB.mt’, overwrite=True)

Error

[Stage 1:> (0 + 67) / 32390]
[Stage 1:> (9 + 64) / 32390]Traceback (most recent call last):
File “”, line 1, in
File “</home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/decorator.py:decorator-gen-1008>”, line 2, in write
File “/home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/hail/matrixtable.py”, line 2500, in write
Env.backend().execute(MatrixWrite(self._mir, writer))
File “/home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/hail/backend/backend.py”, line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File “/home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/home/ubuntu/anaconda3/envs/hail/lib/python3.6/site-packages/hail/utils/java.py”, line 221, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: OutOfMemoryError: Java heap space

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 49 in stage 1.0 failed 1 times, most recent failure: Lost task 49.0 in stage 1.0 (TID 50, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:335)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:795)
at org.apache.hadoop.io.Text.decode(Text.java:412)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.spark.SparkContext$$anonfun$textFile$1$$anonfun$apply$11.apply(SparkContext.scala:831)
at org.apache.spark.SparkContext$$anonfun$textFile$1$$anonfun$apply$11.apply(SparkContext.scala:831)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1268)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:66)
at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:38)
at is.hail.utils.package$.using(package.scala:596)
at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:38)
at is.hail.rvd.RVD$$anonfun$39.apply(RVD.scala:1295)
at is.hail.rvd.RVD$$anonfun$39.apply(RVD.scala:1293)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:196)
at is.hail.rvd.RVD$.getKeyInfo(RVD.scala:1299)
at is.hail.rvd.RVD$.makeCoercer(RVD.scala:1363)
at is.hail.io.vcf.MatrixVCFReader.coercer$lzycompute(LoadVCF.scala:1559)
at is.hail.io.vcf.MatrixVCFReader.coercer(LoadVCF.scala:1559)
at is.hail.io.vcf.MatrixVCFReader.apply(LoadVCF.scala:1588)
at is.hail.expr.ir.TableRead.execute(TableIR.scala:294)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:768)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:90)
at is.hail.expr.ir.CompileAndEvaluate$$anonfun$1.apply(CompileAndEvaluate.scala:33)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:24)
at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:33)
at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)
at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:8)
at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:7)
at is.hail.utils.package$.using(package.scala:596)
at is.hail.annotations.Region$.scoped(Region.scala:11)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)
at is.hail.backend.Backend.execute(Backend.scala:86)
at is.hail.backend.Backend.executeJSON(Backend.scala:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:335)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:795)
at org.apache.hadoop.io.Text.decode(Text.java:412)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.spark.SparkContext$$anonfun$textFile$1$$anonfun$apply$11.apply(SparkContext.scala:831)
at org.apache.spark.SparkContext$$anonfun$textFile$1$$anonfun$apply$11.apply(SparkContext.scala:831)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1268)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:66)
at is.hail.rvd.RVDPartitionInfo$$anonfun$apply$1.apply(RVDPartitionInfo.scala:38)
at is.hail.utils.package$.using(package.scala:596)
at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:38)
at is.hail.rvd.RVD$$anonfun$39.apply(RVD.scala:1295)
at is.hail.rvd.RVD$$anonfun$39.apply(RVD.scala:1293)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:422)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:192)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)

Hail version: 0.2.18-08ec699f0fd4
Error summary: OutOfMemoryError: Java heap space

Did you install Hail/Spark with pip? This means you’re running in local mode. Spark uses a really small amount of resources by default; you can ask for more using an environment variable:

PYSPARK_SUBMIT_ARGS="--driver-memory 400G pyspark-shell"

I also think that 32,390 partitions (processing tasks, unit of parallelism) is too many in this case, and I’m not totally sure how Spark created that many – this means ~15M per chunk, and the smallest file system default I’ve seen is 32M.

To fix this, do the following at the top of your script:

import hail as hl
hl.init(min_block_size=128)  # minimum 128MB

This will also ease memory pressure.

I did all what you suggested

32,390 partitions went down to 8098, but still same error

Hail version: 0.2.18-08ec699f0fd4
Error summary: OutOfMemoryError: Java heap space

can you share the Hail log file?

Hello everyone. I have the same Error. I run following command

eigenvalues, pcs, _ = hl.hwe_normalized_pca(EUR_mt_full.GT)

After 24 h of calculation i get following error

FatalError Traceback (most recent call last)
Input In [38], in <cell line: 1>()
----> 1 eigenvalues, pcs, _ = hl.hwe_normalized_pca(EUR_mt_full.GT)

File :2, in hwe_normalized_pca(call_expr, k, compute_loadings)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs)
574 @decorator
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/methods/pca.py:96, in hwe_normalized_pca(call_expr, k, compute_loadings)
36 @typecheck(call_expr=expr_call,
37 k=int,
38 compute_loadings=bool)
39 def hwe_normalized_pca(call_expr, k=10, compute_loadings=False) → Tuple[List[float], Table, Table]:
40 r""“Run principal component analysis (PCA) on the Hardy-Weinberg-normalized
41 genotype call matrix.
42
(…)
93 List of eigenvalues, table with column scores, table with row loadings.
94 “””
—> 96 return pca(hwe_normalize(call_expr),
97 k,
98 compute_loadings)

File :2, in pca(entry_expr, k, compute_loadings)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs)
574 @decorator
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/methods/pca.py:201, in pca(entry_expr, k, compute_loadings)
193 mt = mt.select_entries(**{field: entry_expr})
194 mt = mt.select_cols().select_rows().select_globals()
196 t = (Table(ir.MatrixToTableApply(mt._mir, {
197 ‘name’: ‘PCA’,
198 ‘entryField’: field,
199 ‘k’: k,
200 ‘computeLoadings’: compute_loadings
→ 201 })).persist())
203 g = t.index_globals()
204 scores = hl.Table.parallelize(g.scores, key=list(mt.col_key))

File :2, in persist(self, storage_level)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs)
574 @decorator
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/table.py:1937, in Table.persist(self, storage_level)
1901 @typecheck_method(storage_level=storage_level)
1902 def persist(self, storage_level=‘MEMORY_AND_DISK’) → ‘Table’:
1903 “”“Persist this table in memory or on disk.
1904
1905 Examples
(…)
1935 Persisted table.
1936 “””
→ 1937 return Env.backend().persist_table(self, storage_level)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/spark_backend.py:296, in SparkBackend.persist_table(self, t, storage_level)
295 def persist_table(self, t, storage_level):
→ 296 return Table._from_java(self._jbackend.pyPersistTable(storage_level, self._to_java_table_ir(t._tir)))

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/py4j/java_gateway.py:1304, in JavaMember.call(self, *args)
1298 command = proto.CALL_COMMAND_NAME +
1299 self.command_header +
1300 args_command +
1301 proto.END_COMMAND_PART
1303 answer = self.gateway_client.send_command(command)
→ 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1307 for temp_arg in temp_args:
1308 temp_arg._detach()

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/py4j_backend.py:31, in handle_java_exception..deco(*args, **kwargs)
29 tpl = Env.jutils().handleForPython(e.java_exception)
30 deepest, full, error_id = tpl._1(), tpl._2(), tpl._3()
—> 31 raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
32 except pyspark.sql.utils.CapturedException as e:
33 raise FatalError(‘%s\n\nJava stack trace:\n%s\n’
34 ‘Hail version: %s\n’
35 ‘Error summary: %s’ % (e.desc, e.stackTrace, hail.version, e.desc)) from None

FatalError: OutOfMemoryError: Java heap space

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in stage 259.0 failed 1 times, most recent failure: Lost task 13.0 in stage 259.0 (TID 14641) (dna.Dlink executor driver): java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:542)
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:158)
at com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:41)
at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:681)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:570)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2432)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2421)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:902)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
at is.hail.methods.PCA.collectRowKeys$1(PCA.scala:51)
at is.hail.methods.PCA.execute(PCA.scala:60)
at is.hail.expr.ir.functions.WrappedMatrixToTableFunction.execute(RelationalFunctions.scala:52)
at is.hail.expr.ir.TableToTableApply.execute(TableIR.scala:2942)
at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:57)
at is.hail.expr.ir.Interpret$.apply(Interpret.scala:27)
at is.hail.backend.spark.SparkBackend.$anonfun$pyPersistTable$2(SparkBackend.scala:537)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
at is.hail.backend.spark.SparkBackend.$anonfun$pyPersistTable$1(SparkBackend.scala:536)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.utils.ExecutionTimer$.logTime(ExecutionTimer.scala:59)
at is.hail.backend.spark.SparkBackend.pyPersistTable(SparkBackend.scala:528)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:542)
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:158)
at com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:41)
at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:681)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:570)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Hail version: 0.2.97-937922d7f46c
Error summary: OutOfMemoryError: Java heap space

Here is link to the log file. The file is too big so i drop it to my google drive.

Could you please suggest solutions to the error?

Hi @Alexey ,

I’m sorry to hear you’re having trouble!

How did you start ipython? How many variants and samples does your dataset have? What were the previous 37 cells?

Hi @danking

Thanks for fast reply.
Actually i run Haill in Jupiter notebook, locally on my server . I have installed Haill with standard command “pip install hail” .
I am trying to analyse 1K genomes data.
So i run following script in Jupiter notebook

import hail as hl
hl.init()
table = (hl.import_table(‘data_Haill//1kg_annotations.txt’, impute=True)
.key_by(‘Sample’))
mt_full = hl.read_matrix_table(‘Haill_mt/1kg_Full.mt’)
mt_full = hl.sample_qc(mt_full)
mt_full = mt_full.annotate_cols(pheno = table[mt_full.s])
EUR_mt_full=mt_full.filter_cols(mt_full.pheno.SuperPopulation == ‘EUR’)
EUR_mt_full.aggregate_cols(hl.agg.counter(EUR_mt_full.pheno.SuperPopulation))
eigenvalues, pcs, _ = hl.hwe_normalized_pca(EUR_mt_full.GT)

Actually problem occurs at the last step: eigenvalues, pcs, _ = hl.hwe_normalized_pca(EUR_mt_full.GT)

I am trying to process 503 samples with 24028591 variants.

I hope it could help you to understand problem. If you need something else let me know

I suggest that the problem in Spark settings. The thing is that Hail work extremely slow. Could you please suggest Spark settings to improve the performance?

OK. A few things!

  1. When using Hail on a single, large server, you need to explicitly tell Apache Spark how much memory is available. See details here: How do I increase the memory or RAM available to the JVM when I start Hail through Python? - #2 by danking. In particular, you might try starting Jupyter this way:
PYSPARK_SUBMIT_ARGS="--driver-memory 460g --executor-memory 460g pyspark-shell" jupyter notebook
  1. When running PCA, you definitely do not need 24M variants. Assuming that you are using PCA to interrogate the ancestry of your samples, common variants are sufficient. I suggest something like this:
EUR_for_pca = EUR_mt_full
EUR_for_pca = hl.variant_qc(EUR_for_pca)
# filter to variants with minor allele frequency >5%
EUR_for_pca = EUR_for_pca.filter_rows(
    (EUR_for_pca.variant_qc.AF[0] > 0.05) & (EUR_for_pca.variant_qc.AF[0] < 0.95)
)
n_common_variants = EUR_for_pca.count_rows()
# keep a random ~10k subset of common variants 
EUR_for_pca = EUR_for_pca.sample_variants(10_000 / n_common_rows)
# save the set of variants for later use
EUR_for_pca.rows().write('Haill_mt/variants_for_pca.ht')
EUR_pca_variants = hl.read_table('Haill_mt/variants_for_pca.ht')
# filter the matrix table to just the PCA variants
EUR_for_pca = EUR_mt_full.semi_join_rows(EUR_pca_variants)
EUR_eigenvalues, EUR_pcs, _ = hl.hwe_normalized_pca(EUR_for_pca.GT)

Hi @danking danking

Thanks for suggestions. I have tried yours script with minor modifications

EUR_for_pca = EUR_mt_full
EUR_for_pca = hl.variant_qc(EUR_for_pca)

filter to variants with minor allele frequency >5%

EUR_for_pca = EUR_for_pca.filter_rows(
(EUR_for_pca.variant_qc.AF[0] > 0.05) & (EUR_for_pca.variant_qc.AF[0] < 0.95)
)
n_common_variants = EUR_for_pca.count_rows()

keep a random ~10k subset of common variants

EUR_for_pca = EUR_for_pca.sample_rows(10_000 / n_common_variants)
#EUR_pca_variants = hl.read_table(‘Haill_mt/variants_for_pca.ht’)

filter the matrix table to just the PCA variants

EUR_for_pca2 = EUR_mt_full.semi_join_rows(EUR_for_pca.rows())

PCA

EUR_eigenvalues, EUR_pcs, _ = hl.hwe_normalized_pca(EUR_for_pca2.GT)

On the last step i get unexpected error. I have no idea how to deal with that.

FatalError Traceback (most recent call last)
Input In [32], in <cell line: 4>()
1 #EUR_pca_variants = hl.read_table(‘Haill_mt/variants_for_pca.ht’)
2 # filter the matrix table to just the PCA variants
3 #EUR_for_pca2 = EUR_mt_full.semi_join_rows(EUR_for_pca.rows())
----> 4 EUR_eigenvalues, EUR_pcs, _ = hl.hwe_normalized_pca(EUR_for_pca2.GT)

File :2, in hwe_normalized_pca(call_expr, k, compute_loadings)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs)
574 @decorator
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/methods/pca.py:96, in hwe_normalized_pca(call_expr, k, compute_loadings)
36 @typecheck(call_expr=expr_call,
37 k=int,
38 compute_loadings=bool)
39 def hwe_normalized_pca(call_expr, k=10, compute_loadings=False) → Tuple[List[float], Table, Table]:
40 r""“Run principal component analysis (PCA) on the Hardy-Weinberg-normalized
41 genotype call matrix.
42
(…)
93 List of eigenvalues, table with column scores, table with row loadings.
94 “””
—> 96 return pca(hwe_normalize(call_expr),
97 k,
98 compute_loadings)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/methods/pca.py:22, in hwe_normalize(call_expr)
18 mt = mt.annotate_rows(__AC=agg.sum(mt.__gt),
19 __n_called=agg.count_where(hl.is_defined(mt.__gt)))
20 mt = mt.filter_rows((mt.__AC > 0) & (mt.__AC < 2 * mt.__n_called))
—> 22 n_variants = mt.count_rows()
23 if n_variants == 0:
24 raise FatalError(“hwe_normalize: found 0 variants after filtering out monomorphic sites.”)

File :2, in count_rows(self, _localize)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs)
574 @decorator
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/matrixtable.py:2397, in MatrixTable.count_rows(self, _localize)
2395 count_ir = ir.TableCount(ir.MatrixRowsTable(self._mir))
2396 if _localize:
→ 2397 return Env.backend().execute(count_ir)
2398 else:
2399 return construct_expr(ir.LiftMeOut(count_ir), hl.tint64)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/py4j_backend.py:104, in Py4JBackend.execute(self, ir, timed)
102 return (value, timings) if timed else value
103 except FatalError as e:
→ 104 self._handle_fatal_error_from_backend(e, ir)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/backend.py:181, in Backend._handle_fatal_error_from_backend(self, err, ir)
179 error_sources = ir.base_search(lambda x: x._error_id == err._error_id)
180 if len(error_sources) == 0:
→ 181 raise err
183 better_stack_trace = error_sources[0]._stack_trace
184 error_message = str(err)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/py4j_backend.py:98, in Py4JBackend.execute(self, ir, timed)
96 # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))
97 try:
—> 98 result_tuple = self._jbackend.executeEncode(jir, stream_codec)
99 (result, timings) = (result_tuple._1(), result_tuple._2())
100 value = ir.typ._from_encoding(result)

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/py4j/java_gateway.py:1304, in JavaMember.call(self, *args)
1298 command = proto.CALL_COMMAND_NAME +
1299 self.command_header +
1300 args_command +
1301 proto.END_COMMAND_PART
1303 answer = self.gateway_client.send_command(command)
→ 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1307 for temp_arg in temp_args:
1308 temp_arg._detach()

File ~/anaconda3/envs/ven_novoselov/lib/python3.10/site-packages/hail/backend/py4j_backend.py:31, in handle_java_exception..deco(*args, **kwargs)
29 tpl = Env.jutils().handleForPython(e.java_exception)
30 deepest, full, error_id = tpl._1(), tpl._2(), tpl._3()
—> 31 raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
32 except pyspark.sql.utils.CapturedException as e:
33 raise FatalError(‘%s\n\nJava stack trace:\n%s\n’
34 ‘Hail version: %s\n’
35 ‘Error summary: %s’ % (e.desc, e.stackTrace, hail.version, e.desc)) from None

FatalError: NoSuchElementException: Ref with name __iruid_17776 could not be resolved in env BindingEnv((),None,None,())

Java stack trace:
is.hail.utils.HailException: error after applying LowerArrayAggsToRunAggs
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:21)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:21)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:25)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.Compile$.apply(Compile.scala:50)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2012)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableLeftJoinRightDistinct.execute(TableIR.scala:1925)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableKeyBy.execute(TableIR.scala:1362)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:57)
at is.hail.expr.ir.Interpret$.$anonfun$run$71(Interpret.scala:846)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:846)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:416)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:452)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:449)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:448)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)

is.hail.utils.HailException: Error while typechecking IR:
(Coalesce
(Let __iruid_17778
(RunAgg ((CallStatsStateSig))
(Begin
(Begin
(InitOp 0 (CallStats (CallStatsStateSig))
((ArrayLen
(GetField alleles (Ref __iruid_17776))))))
(StreamFor __iruid_17774
(StreamFilter __iruid_17775
(StreamRange -1 False
(I32 0)
(ArrayLen
(GetField
the entries! [877f12a8827e18f61222c6c8c5fb04a8]
(In
SingleCodeEmitParamType(true, PTypeReferenceSingleCodeType(+PCStruct{locus:+PCLocus(GRCh37),alleles:+PCArray[+PCString],__row_uid:PInt64,the entries! [877f12a8827e18f61222c6c8c5fb04a8]:+PCArray[+PCStruct{GT:PCCall}]}))
1)))
(I32 1))
(ApplyUnaryPrimOp Bang
(IsNA
(ArrayRef -1
(GetField
the entries! [877f12a8827e18f61222c6c8c5fb04a8]
(In
SingleCodeEmitParamType(true, PTypeReferenceSingleCodeType(+PCStruct{locus:+PCLocus(GRCh37),alleles:+PCArray[+PCString],__row_uid:PInt64,the entries! [877f12a8827e18f61222c6c8c5fb04a8]:+PCArray[+PCStruct{GT:PCCall}]}))
1))
(Ref __iruid_17775)))))
(Begin
(SeqOp 0 (CallStats (CallStatsStateSig))
((GetField GT
(ArrayRef -1
(GetField
the entries! [877f12a8827e18f61222c6c8c5fb04a8]
(In
SingleCodeEmitParamType(true, PTypeReferenceSingleCodeType(+PCStruct{locus:+PCLocus(GRCh37),alleles:+PCArray[+PCString],__row_uid:PInt64,the entries! [877f12a8827e18f61222c6c8c5fb04a8]:+PCArray[+PCStruct{GT:PCCall}]}))
1))
(Ref __iruid_17774))))))))
(MakeTuple (0)
(ResultOp 0 (CallStats (CallStatsStateSig)))))
(InsertFields
(Let __iruid_17776
(SelectFields (locus alleles)
(In
SingleCodeEmitParamType(true, PTypeReferenceSingleCodeType(+PCStruct{locus:+PCLocus(GRCh37),alleles:+PCArray[+PCString],__row_uid:PInt64,the entries! [877f12a8827e18f61222c6c8c5fb04a8]:+PCArray[+PCStruct{GT:PCCall}]}))
1))
(InsertFields
(Ref __iruid_17776)
None
(variant_qc
(Let __iruid_17777
(MakeStruct
(call_stats
(GetTupleElement 0 (Ref __iruid_17778))))
(MakeStruct
(AF
(GetField AF
(GetField call_stats (Ref __iruid_17777)))))))))
None
(__row_uid
(GetField __row_uid
(In
SingleCodeEmitParamType(true, PTypeReferenceSingleCodeType(+PCStruct{locus:+PCLocus(GRCh37),alleles:+PCArray[+PCString],__row_uid:PInt64,the entries! [877f12a8827e18f61222c6c8c5fb04a8]:+PCArray[+PCStruct{GT:PCCall}]}))
1)))))
(Die
Struct{locus:Locus(GRCh37),alleles:Array[String],variant_qc:Struct{AF:Array[Float64]},__row_uid:Int64}
-1
(Str “Internal e…”)))
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:21)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:21)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.Compile$.apply(Compile.scala:50)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2012)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableLeftJoinRightDistinct.execute(TableIR.scala:1925)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableKeyBy.execute(TableIR.scala:1362)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:57)
at is.hail.expr.ir.Interpret$.$anonfun$run$71(Interpret.scala:846)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:846)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:416)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:452)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:449)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:448)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)

java.util.NoSuchElementException: Ref with name __iruid_17776 could not be resolved in env BindingEnv((),None,None,())
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:98)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1(TypeCheck.scala:33)
at is.hail.expr.ir.TypeCheck$.$anonfun$check$1$adapted(TypeCheck.scala:31)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at is.hail.expr.ir.TypeCheck$.check(TypeCheck.scala:31)
at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:13)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.Compile$.apply(Compile.scala:50)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2012)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableLeftJoinRightDistinct.execute(TableIR.scala:1925)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableKeyBy.execute(TableIR.scala:1362)
at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2006)
at is.hail.expr.ir.TableFilter.execute(TableIR.scala:1432)
at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:57)
at is.hail.expr.ir.Interpret$.$anonfun$run$71(Interpret.scala:846)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at is.hail.expr.ir.Interpret$.run(Interpret.scala:846)
at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:416)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:452)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
at is.hail.utils.package$.using(package.scala:640)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:310)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:449)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:448)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)

Hail version: 0.2.97-937922d7f46c
Error summary: NoSuchElementException: Ref with name __iruid_17776 could not be resolved in env BindingEnv((),None,None,())

Haw you got any clue how to resolve the error?

Hey @Alexey , can you upload the hail log file for that failure? This looks like a bug in Hail.

I’m the meantime try actually using the written variants. Writing and reading a table separates it from the rest of the pipeline and can avoid bugs like this.

The log file is here
hail-20220817-1036-0.2.97-937922d7f46c.log (4.2 MB)

I have tried to use written variant, but i get a same error on the step ‘Writing a table’

|557px;x209px;

This bug is fixed here:

Hi @tpoterba

thanks a lot for the commit. Could you please drop an instruction about implementation of the commit.
In other words. How can i use it to upgrade my current version of Hail?

After this commit merges, you can clone the repository and build from source to pick up the change. we’ll probably also make a release in the next couple days.

If you want to circumvent the bug for now, though, checkpoint the EUR_for_pca table to disk and read from disk before doing the semi_join.