NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$

yluo411 · January 30, 2022, 2:55am

Hi Hail Team,

I am getting a strange error when trying to run the following code on Cluster:

import hail as hl
import pandas as pd
import contextlib
import argparse
import nest_asyncio
    
hl.init(default_reference=args.genome_build, log=args.log)
    
recode = {f"{i}":f"chr{i}" for i in (list(range(1, 23)) + ['X', 'Y'])}
vcf = hl.import_vcf(args.vcf,min_partitions=args.min_partitions,array_elements_required = False,force_bgz=True,contig_recoding=recode)
vcf = hl.variant_qc(hl.split_multi_hts(vcf.drop('PL'),permit_shuffle=True), name='qc')

vcf = vcf.filter_rows((vcf.alleles[1] == "*") | (vcf.qc.AC[0] == 0) | (vcf.qc.AC[1] == 0),keep=False)
vcf = vcf.annotate_rows(info=vcf.info.annotate(AC=vcf.qc.AC,AN=2*vcf.qc.n_called,AF=vcf.qc.AF))

ref = hl.import_vcf(args.reference, force_bgz=True, min_partitions=args.min_partitions)
vcf = vcf.annotate_rows(info=vcf.info.annotate(novel=hl.is_missing(ref.rows()[vcf.row_key])))
ibd_df = hl.identity_by_descent(vcf)
ibd_df.show()
#ibd_df = hl.identity_by_descent(vcf).to_pandas()


Error message: Error summary: NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
line 130, in <module>
    ibd_df.show()

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 9.0 failed 1 times, most recent failure: Lost task 6.0 in stage 9.0 (TID 558) (compute-priv-1-1.local executor driver): java.lang.NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7(IBD.scala:232)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7$adapted(IBD.scala:229)
        at scala.Array$.tabulate(Array.scala:334)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6(IBD.scala:229)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6$adapted(IBD.scala:227)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at is.hail.utils.richUtils.RichContextRDD$$anon$1.next(RichContextRDD.scala:77)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1868)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1$adapted(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
        at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:50)
        at org.apache.spark.rdd.RDD.$anonfun$zipWithIndex$1(RDD.scala:1389)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
        at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1389)
        at is.hail.methods.IBD$.computeIBDMatrix(IBD.scala:225)
        at is.hail.methods.IBD.execute(IBD.scala:349)
        at is.hail.expr.ir.functions.WrappedMatrixToTableFunction.execute(RelationalFunctions.scala:51)
        at is.hail.expr.ir.TableToTableApply.execute(TableIR.scala:2936)
        at is.hail.expr.ir.TableOrderBy.execute(TableIR.scala:2802)
        at is.hail.expr.ir.TableSubset.execute(TableIR.scala:1472)
        at is.hail.expr.ir.TableSubset.execute$(TableIR.scala:1471)
        at is.hail.expr.ir.TableHead.execute(TableIR.scala:1480)
        at is.hail.expr.ir.TableMapRows.execute(TableIR.scala:2001)
        at is.hail.expr.ir.TableIR.analyzeAndExecute(TableIR.scala:58)
        at is.hail.expr.ir.Interpret$.run(Interpret.scala:846)
        at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:57)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:20)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
        at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
        at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
        at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
        at is.hail.utils.package$.using(package.scala:638)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
        at is.hail.utils.package$.using(package.scala:638)
        at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
        at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
        at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
        at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
        at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)

java.lang.NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7(IBD.scala:232)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$7$adapted(IBD.scala:229)
        at scala.Array$.tabulate(Array.scala:334)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6(IBD.scala:229)
        at is.hail.methods.IBD$.$anonfun$computeIBDMatrix$6$adapted(IBD.scala:227)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at is.hail.utils.richUtils.RichContextRDD$$anon$1.next(RichContextRDD.scala:77)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:494)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1868)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.rdd.ZippedWithIndexRDD.$anonfun$startIndices$1$adapted(ZippedWithIndexRDD.scala:52)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Hail version: 0.2.82-2ab242915c2c
Error summary: NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$

I would like to appreciate if any suggestions or ideas. Thanks!

danking · January 31, 2022, 3:08pm

Hey @yluo411 !

I’m sorry to hear you’re running into trouble! I have a few suggestions to help you succeed.

When you’re running Hail on your own cluster, then you must compile Hail from source. This error almost always means you did not compile for source or the compilation was unsuccessful. If you compile from source and still experience an error, please respond here with the output of

uname -a
pip3 show hail
python3 -V
java -version
echo $JAVA_HOME

Although Hail supports reading directly from VCF files, I strongly, strongly recommend converting the VCF file to a Hail MatrixTable file. Hail MatrixTable files are an efficient, binary, compressed format for storage of genetic data. You can run this to convert the vcf:

vcf = hl.import_vcf(...)
vcf.write('..../dataset.mt')
mt = hl.read_matrix_table('..../dataset.mt')
# use mt instead of vcf

I recommend using only 10,000 to 100,000 variants for hl.identity_by_descent. Very rare variants do not contribute substantially to the accuracy of the relatedness matrix. As a result, using more than 100,000 variants in the IBD calculation is unnecessarily computationally expensive. I recommend filtering to variants with quite high minor allele frequency. If you have lots of common variants, then I also recommend using sample_rows to randomly choose a smaller subset of common variants.
Converting Hail objects directly to pandas objects almost always causes a memory error. I strongly, strongly recommend writing the ibd_df to a file first, reading the result, then converting that to pandas:

ibd_df = hl.identity_by_descent(vcf)
ibd_df.write('...')
ibd_df = hl.read_table('...')
ibd_df = ibd_df.to_pandas()

yluo411 · January 31, 2022, 8:05pm

Hi Danking,

Thank you so much for your quick reply. Based on your suggestion 1, I re-compiled Hail from source and it works right now. Thank you so much for your all suggestions.

Best,
YL

Topic		Replies	Views
Import_vcf() on databricks results in NoClassDefFoundError Help [0.1]	2	814	May 8, 2017
java.lang.NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$ Hail Query & hailctl	0	146	February 14, 2024
java.lang.UnsatisfiedLinkError: is.hail.annotations.Region.nativeCtor() Help [0.1]	15	1051	July 19, 2018
NoClassDefFoundError with add_sequence Hail Query & hailctl	2	340	August 28, 2020
Ibd() export error Hail Query & hailctl	1	393	August 13, 2022

NoClassDefFoundError: Could not initialize class is.hail.methods.IBSFFI$

Related topics