Error while LD pruning variants - hail.utils.java.FatalError: IllegalArgumentException: requirement failed

Hello,

I am trying to LD prune a matrix but am getting a Java error I can’t make much sense of.

The command is:

pruned_variant_table = hl.ld_prune(common_g1000_mt.GT, r2=0.2, bp_window_size=50000)

I’m developing locally on MacOS at the moment with GRCh37 IGSR data but I can’t see any indication this is a memory error in the trace.

I’d be grateful for any ideas people have about how to resolve this?

Stack trace below.

Thanks,
Angus

2023-05-03 08:34:02.128 Hail: INFO: ld_prune: running local pruning stage with max queue size of 99274 variants
[Stage 28:============================================> (106 + 10) / 125]Traceback (most recent call last):
File “/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py”, line 3, in Exec
exec(exp, global_vars, local_vars)
File “”, line 1, in
File “”, line 2, in ld_prune
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/typecheck/check.py”, line 584, in wrapper
return original_func(*args, **kwargs)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/methods/statgen.py”, line 4660, in ld_prune
(_local_ld_prune(require_biallelic(mt, ‘ld_prune’), field, r2, bp_window_size, memory_per_core)
File “”, line 2, in _local_ld_prune
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/typecheck/check.py”, line 584, in wrapper
return original_func(*args, **kwargs)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/methods/statgen.py”, line 4550, in _local_ld_prune
})).persist()
File “”, line 2, in persist
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/typecheck/check.py”, line 584, in wrapper
return original_func(*args, **kwargs)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/table.py”, line 2127, in persist
return Env.backend().persist_table(self)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/backend/backend.py”, line 163, in persist_table
return t.checkpoint(tf.enter())
File “”, line 2, in checkpoint
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/typecheck/check.py”, line 584, in wrapper
return original_func(*args, **kwargs)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/table.py”, line 1346, in checkpoint
self.write(output=output, overwrite=overwrite, stage_locally=stage_locally, _codec_spec=_codec_spec)
File “”, line 2, in write
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/typecheck/check.py”, line 584, in wrapper
return original_func(*args, **kwargs)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/table.py”, line 1392, in write
Env.backend().execute(ir.TableWrite(self._tir, ir.TableNativeWriter(output, overwrite, stage_locally, _codec_spec)))
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/backend/py4j_backend.py”, line 82, in execute
raise e.maybe_user_error(ir) from None
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/backend/py4j_backend.py”, line 76, in execute
result_tuple = self._jbackend.executeEncode(jir, stream_codec, timed)
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/py4j/java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “/Users/angusgane/Bioinformatics/Hail/hail_genomic_qc/venv/lib/python3.10/site-packages/hail/backend/py4j_backend.py”, line 35, in deco
raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: IllegalArgumentException: requirement failed
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 112 in stage 28.0 failed 1 times, most recent failure: Lost task 112.0 in stage 28.0 (TID 424) (172.31.148.88 executor driver): java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
at is.hail.methods.BitPackedVectorBuilder.addGT(LocalLDPrune.scala:49)
at __C56836collect_distributed_array_table_native_writer.apply_region631_640(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply_region21_686(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply(Unknown Source)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$4(BackendUtils.scala:49)
at is.hail.utils.package$.using(package.scala:635)
at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:162)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$3(BackendUtils.scala:48)
at is.hail.backend.spark.SparkBackendComputeRDD.compute(SparkBackend.scala:793)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
at is.hail.backend.spark.SparkBackend.parallelizeAndComputeWithIndex(SparkBackend.scala:368)
at is.hail.backend.BackendUtils.collectDArray(BackendUtils.scala:44)
at __C56469Compiled.__m56591split_CollectDistributedArray_region24_53(Emit.scala)
at __C56469Compiled.__m56591split_CollectDistributedArray(Emit.scala)
at __C56469Compiled.__m56471split_WriteMetadata(Emit.scala)
at __C56469Compiled.apply(Emit.scala)
at is.hail.expr.ir.CompileAndEvaluate$.anonfun_apply$4(CompileAndEvaluate.scala:61)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.CompileAndEvaluate$.anonfun_apply$2(CompileAndEvaluate.scala:61)
at is.hail.expr.ir.CompileAndEvaluate$.anonfun_apply$2$adapted(CompileAndEvaluate.scala:59)
at is.hail.backend.ExecuteContext.$anonfun$scopedExecution$1(ExecuteContext.scala:140)
at is.hail.utils.package$.using(package.scala:635)
at is.hail.backend.ExecuteContext.scopedExecution(ExecuteContext.scala:140)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:59)
at is.hail.expr.ir.CompileAndEvaluate$.evalToIR(CompileAndEvaluate.scala:33)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:30)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:67)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:62)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:20)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:20)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:50)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:463)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:499)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:75)
at is.hail.utils.package$.using(package.scala:635)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:75)
at is.hail.utils.package$.using(package.scala:635)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:63)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:351)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:496)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:495)
at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
at is.hail.methods.BitPackedVectorBuilder.addGT(LocalLDPrune.scala:49)
at __C56836collect_distributed_array_table_native_writer.apply_region631_640(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply_region21_686(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply(Unknown Source)
at __C56836collect_distributed_array_table_native_writer.apply(Unknown Source)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$4(BackendUtils.scala:49)
at is.hail.utils.package$.using(package.scala:635)
at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:162)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$3(BackendUtils.scala:48)
at is.hail.backend.spark.SparkBackendComputeRDD.compute(SparkBackend.scala:793)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Hail version: 0.2.113-cf32652c5077
Error summary: IllegalArgumentException: requirement failed
[Stage 28:=============================================> (106 + 1) / 125]

This is a bad error (which we treat as a bug). The core issue here is that LD prune in Hail doesn’t support haploid calls. You could recode those as diploid first:

mt = mt.annotate_entries(GT = hl.if_else(mt.GT.is_diploid(), mt.GT, hl.call(mt.GT[0], mt.GT[0]))

Tracking issue here:

Wouldn’t have got that in a million years but it’s all working now! Thanks for the rapid solution!
Angus