Error summary: ChecksumException: Checksum error

Hi All,

I’m currently facing issues with a ChecksumException error during interval filtering.
Interestingly, this problem has also surfaced during other processing steps such as table annotation.

I’m wondering if there’s an option to bypass the checksum steps or if anyone has a solution for this particular error.

Details provided below.
Thank you.

code

kor_cm_tb = hl.read_table(korcm_tb_dir) 
kor_cm_tb = kor_cm_tb.key_by('locus','alleles')

first_intervals = ['chr1', 'chr2', 'chr3', 'chr4']
kor_cm_tb1 = hl.filter_intervals(kor_cm_tb,  [hl.parse_locus_interval(x,) for x in first_intervals])
print('kor_cm_tb1 count: ',kor_cm_tb1.count())

error message summary

Error summary: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029

full error messages

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.4
SparkUI available at http://cpu64-only-001:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.115-10932c754edb
LOGGING: writing to /home01/k099a02/kor_retro/log/hail_231128_cm_test3.log
Traceback (most recent call last):                              (35 + 60) / 199]
  File "/home01/k099a02/script/kor_retro/kor_retro_cm.py", line 134, in <module>
    print('kor_cm_tb1 count: ',kor_cm_tb1.count())
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/table.py", line 434, in count
    return Env.backend().execute(ir.TableCount(self._tir))
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 82, in execute
    raise e.maybe_user_error(ir) from None
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 76, in execute
    result_tuple = self._jbackend.executeEncode(jir, stream_codec, timed)
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/py4j/java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 35, in deco
    raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 82 in stage 0.0 failed 1 times, most recent failure: Lost task 82.0 in stage 0.0 (TID 82) (cpu64-only-001 executor driver): org.apache.hadoop.fs.ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029
	at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:347)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:303)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:252)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:197)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.io.fs.HadoopFS$$anon$2.read(HadoopFS.scala:55)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.utils.richUtils.RichInputStream$.readRepeatedly$extension0(RichInputStream.scala:21)
	at is.hail.utils.richUtils.RichInputStream$.readFully$extension1(RichInputStream.scala:12)
	at is.hail.io.StreamBlockInputBuffer.readBlock(InputBuffers.scala:550)
	at is.hail.io.LZ4InputBlockBuffer.readBlock(InputBuffers.scala:584)
	at is.hail.io.BlockingInputBuffer.readBlock(InputBuffers.scala:382)
	at is.hail.io.BlockingInputBuffer.ensure(InputBuffers.scala:388)
	at is.hail.io.BlockingInputBuffer.skipDouble(InputBuffers.scala:499)
	at is.hail.io.LEB128InputBuffer.skipDouble(InputBuffers.scala:270)
	at __C485stream_Let.__m507SKIP_o_float64(Emit.scala)
	at __C485stream_Let.__m497DECODE_r_struct_of_r_struct_of_r_binaryANDr_int32ENDANDr_array_of_r_binaryANDo_binaryANDo_binaryANDr_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDr_binaryANDr_int32ANDo_binaryANDr_float64ANDo_int32ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_float64ANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_binaryANDo_array_of_o_struct_of_o_binaryANDo_binaryENDANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_int32ANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_float64ANDo_binaryANDo_int32ANDo_int32ANDo_int32ANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_binaryANDo_binaryEND_TO_SBaseStructPointer(Emit.scala)
	at __C485stream_Let.apply(Emit.scala)
	at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:303)
	at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:156)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at is.hail.rvd.RVDPartitionInfo$.$anonfun$apply$1(RVDPartitionInfo.scala:70)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:42)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1049)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1047)
	at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2668)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2604)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2603)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2603)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1178)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1178)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1178)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2856)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2798)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2787)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2291)
	at is.hail.sparkextras.ContextRDD.crunJobWithIndex(ContextRDD.scala:238)
	at is.hail.rvd.RVD$.getKeyInfo(RVD.scala:1047)
	at is.hail.rvd.RVD$.makeCoercer(RVD.scala:1122)
	at is.hail.rvd.RVD$.coerce(RVD.scala:1078)
	at is.hail.rvd.RVD.changeKey(RVD.scala:142)
	at is.hail.rvd.RVD.changeKey(RVD.scala:135)
	at is.hail.backend.spark.SparkBackend.lowerDistributedSort(SparkBackend.scala:735)
	at is.hail.backend.Backend.lowerDistributedSort(Backend.scala:100)
	at is.hail.expr.ir.lowering.LowerAndExecuteShuffles$.$anonfun$apply$1(LowerAndExecuteShuffles.scala:23)
	at is.hail.expr.ir.RewriteBottomUp$.$anonfun$apply$4(RewriteBottomUp.scala:26)
	at is.hail.utils.StackSafe$More.advance(StackSafe.scala:60)
	at is.hail.utils.StackSafe$.run(StackSafe.scala:16)
	at is.hail.utils.StackSafe$StackFrame.run(StackSafe.scala:32)
	at is.hail.expr.ir.RewriteBottomUp$.apply(RewriteBottomUp.scala:36)
	at is.hail.expr.ir.lowering.LowerAndExecuteShuffles$.apply(LowerAndExecuteShuffles.scala:20)
	at is.hail.expr.ir.lowering.LowerAndExecuteShufflesPass.transform(LoweringPass.scala:157)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.LowerAndExecuteShufflesPass.apply(LoweringPass.scala:151)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:20)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:20)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.execute$1(EvalRelationalLets.scala:10)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.lower$1(EvalRelationalLets.scala:18)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.apply(EvalRelationalLets.scala:37)
	at is.hail.expr.ir.lowering.EvalRelationalLetsPass.transform(LoweringPass.scala:147)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.EvalRelationalLetsPass.apply(LoweringPass.scala:141)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:20)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:20)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:50)
	at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:463)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:499)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:75)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:75)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:63)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:351)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:496)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:495)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:834)

org.apache.hadoop.fs.ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029
	at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:347)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:303)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:252)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:197)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.io.fs.HadoopFS$$anon$2.read(HadoopFS.scala:55)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.utils.richUtils.RichInputStream$.readRepeatedly$extension0(RichInputStream.scala:21)
	at is.hail.utils.richUtils.RichInputStream$.readFully$extension1(RichInputStream.scala:12)
	at is.hail.io.StreamBlockInputBuffer.readBlock(InputBuffers.scala:550)
	at is.hail.io.LZ4InputBlockBuffer.readBlock(InputBuffers.scala:584)
	at is.hail.io.BlockingInputBuffer.readBlock(InputBuffers.scala:382)
	at is.hail.io.BlockingInputBuffer.ensure(InputBuffers.scala:388)
	at is.hail.io.BlockingInputBuffer.skipDouble(InputBuffers.scala:499)
	at is.hail.io.LEB128InputBuffer.skipDouble(InputBuffers.scala:270)
	at __C485stream_Let.__m507SKIP_o_float64(Emit.scala)
	at __C485stream_Let.__m497DECODE_r_struct_of_r_struct_of_r_binaryANDr_int32ENDANDr_array_of_r_binaryANDo_binaryANDo_binaryANDr_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDr_binaryANDr_int32ANDo_binaryANDr_float64ANDo_int32ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_float64ANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_binaryANDo_array_of_o_struct_of_o_binaryANDo_binaryENDANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_int32ANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_float64ANDo_binaryANDo_int32ANDo_int32ANDo_int32ANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_binaryANDo_binaryEND_TO_SBaseStructPointer(Emit.scala)
	at __C485stream_Let.apply(Emit.scala)
	at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:303)
	at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:156)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at is.hail.rvd.RVDPartitionInfo$.$anonfun$apply$1(RVDPartitionInfo.scala:70)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:42)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1049)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1047)
	at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)




Hail version: 0.2.115-10932c754edb
Error summary: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029