NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$

Hi Hail Team,

I am getting a strange error when trying to run the following code on Cluster:

import hail as hl
import pandas as pd
import contextlib
import argparse
import nest_asyncio
    
hl.init(default_reference=args.genome_build, log=args.log)
    
recode = {f"{i}":f"chr{i}" for i in (list(range(1, 23)) + ['X', 'Y'])}
vcf = hl.import_vcf(args.vcf,min_partitions=args.min_partitions,array_elements_required = False,force_bgz=True,contig_recoding=recode)
vcf = hl.variant_qc(hl.split_multi_hts(vcf.drop('PL'),permit_shuffle=True), name='qc')

vcf = vcf.filter_rows((vcf.alleles[1] == "*") | (vcf.qc.AC[0] == 0) | (vcf.qc.AC[1] == 0),keep=False)
vcf = vcf.annotate_rows(info=vcf.info.annotate(AC=vcf.qc.AC,AN=2*vcf.qc.n_called,AF=vcf.qc.AF))

ref = hl.import_vcf(args.reference, force_bgz=True, min_partitions=args.min_partitions)
vcf = vcf.annotate_rows(info=vcf.info.annotate(novel=hl.is_missing(ref.rows()[vcf.row_key])))
ibd_df = hl.identity_by_descent(vcf)
ibd_df.show()
#ibd_df = hl.identity_by_descent(vcf).to_pandas()


Error message: Error summary: NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
line 130, in <module>
    ibd_df.show()

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 9.0 failed 1 times, most recent failure: Lost task 6.0 in stage 9.0 (TID 558) (compute-priv-1-1.local executor driver): java.lang.NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7(IBD.scala:232)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7$adapted(IBD.scala:229)
        at scala.Array$.tabulate(Array.scala:334)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6(IBD.scala:229)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6$adapted(IBD.scala:227)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at is.hail.utils.richUtils.RichContextRDD$$anon$1.next(RichContextRDD.scala:77)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1868)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1$adapted(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
        at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:50)
        at org.apache.spark.rdd.RDD.$anonfun$zipWithIndex$1(RDD.scala:1389)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
        at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1389)
        at is.hail.methods.IBD$.computeIBDMatrix(IBD.scala:225)
        at is.hail.methods.IBD.execute(IBD.scala:349)
        at is.hail.expr.ir.functions.WrappedMatrixToTableFunction.execute(RelationalFunctions.scala:51)
        at is.hail.expr.ir.TableToTableApply.execute(TableIR.scala:2936)
        at is.hail.expr.ir.TableOrderBy.execute(TableIR.scala:2802)
        at is.hail.expr.ir.TableSubset.execute(TableIR.scala:1472)
        at is.hail.expr.ir.TableSubset.execute$(TableIR.scala:1471)
        at is.hail.expr.ir.TableHead.execute(TableIR.scala:1480)
        at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2001)
        at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:58)
        at is.hail.expr.ir.Interpret$.run(Interpret.scala:846)
        at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
        at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
        at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
        at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
        at is.hail.utils.package$.using(package.scala:638)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
        at is.hail.utils.package$.using(package.scala:638)
        at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
        at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
        at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
        at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
        at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)

java.lang.NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7(IBD.scala:232)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7$adapted(IBD.scala:229)
        at scala.Array$.tabulate(Array.scala:334)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6(IBD.scala:229)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6$adapted(IBD.scala:227)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at is.hail.utils.richUtils.RichContextRDD$$anon$1.next(RichContextRDD.scala:77)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1868)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1$adapted(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Hail version: 0.2.82-2ab242915c2c
Error summary: NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$

I would like to appreciate if any suggestions or ideas. Thanks!

Hey @yluo411 !

I’m sorry to hear you’re running into trouble! I have a few suggestions to help you succeed.

  1. When you’re running Hail on your own cluster, then you must compile Hail from source. This error almost always means you did not compile for source or the compilation was unsuccessful. If you compile from source and still experience an error, please respond here with the output of
uname -a
pip3 show hail
python3 -V
java -version
echo $JAVA_HOME
  1. Although Hail supports reading directly from VCF files, I strongly, strongly recommend converting the VCF file to a Hail MatrixTable file. Hail MatrixTable files are an efficient, binary, compressed format for storage of genetic data. You can run this to convert the vcf:
vcf = hl.import_vcf(...)
vcf.write('..../dataset.mt')
mt = hl.read_matrix_table('..../dataset.mt')
# use mt instead of vcf
  1. I recommend using only 10,000 to 100,000 variants for hl.identity_by_descent. Very rare variants do not contribute substantially to the accuracy of the relatedness matrix. As a result, using more than 100,000 variants in the IBD calculation is unnecessarily computationally expensive. I recommend filtering to variants with quite high minor allele frequency. If you have lots of common variants, then I also recommend using sample_rows to randomly choose a smaller subset of common variants.

  2. Converting Hail objects directly to pandas objects almost always causes a memory error. I strongly, strongly recommend writing the ibd_df to a file first, reading the result, then converting that to pandas:

ibd_df = hl.identity_by_descent(vcf)
ibd_df.write('...')
ibd_df = hl.read_table('...')
ibd_df = ibd_df.to_pandas()

Hi Danking,

Thank you so much for your quick reply. Based on your suggestion 1, I re-compiled Hail from source and it works right now. Thank you so much for your all suggestions.

Best,
YL

1 Like