Value error, but cannot find cause

I am integrating several data sources into a MatrixTable which runs fine until it is time to save to disk. This is when I get the following error message:

Error summary: IllegalFormatConversionException: d != java.lang.String

I’m having trouble chasing down the source of this error. From the error message I can’t tell what field the offending value is in, so I don’t really know where to start looking for any illegal values in my original tsv files.

I also don’t understand what d in the error message is. Is d the name of the field? Such a field doesn’t exist in my table. Is d the offending value itself? I should think that "d" would convert to a string just fine.

Many thanks,

There should have been a stack trace also printed with the error message. Would you mind pasting that?

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<decorator-gen-980>", line 2, in write
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 614, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/hail/table.py", line 1271, in write
    Env.backend().execute(ir.TableWrite(self._tir, ir.TableNativeWriter(output, overwrite, stage_locally, _codec_spec)))
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 98, in execute
    raise e
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 74, in execute
    result = json.loads(self._jhc.backend().executeJSON(jir))
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/hannes/opt/anaconda3/envs/hail/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 32, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
hail.utils.java.FatalError: IllegalFormatConversionException: d != java.lang.String
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 30.0 failed 1 times, most recent failure: Lost task 1.0 in stage 30.0 (TID 21657, localhost, executor driver): java.util.IllegalFormatConversionException: d != java.lang.String
	at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
	at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
	at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
	at java.util.Formatter.format(Formatter.java:2520)
	at java.util.Formatter.format(Formatter.java:2455)
	at java.lang.String.format(String.java:2940)
	at is.hail.expr.ir.functions.UtilFunctions$.format(UtilFunctions.scala:123)
	at __C2061Compiled.__m2098format(Emit.scala)
	at __C2061Compiled.apply(Emit.scala)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply$mcJJ$sp(TableIR.scala:1876)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply(TableIR.scala:1875)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply(TableIR.scala:1875)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
	at is.hail.io.RichContextRDDLong$$anonfun$boundary$extension$2$$anon$1.next(RichContextRDDRegionValue.scala:205)
	at is.hail.io.RichContextRDDLong$$anonfun$boundary$extension$2$$anon$1.next(RichContextRDDRegionValue.scala:189)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at is.hail.io.RichContextRDDRegionValue$.writeRowsPartition(RichContextRDDRegionValue.scala:36)
	at is.hail.io.RichContextRDDLong$$anonfun$writeRows$extension$1.apply(RichContextRDDRegionValue.scala:230)
	at is.hail.io.RichContextRDDLong$$anonfun$writeRows$extension$1.apply(RichContextRDDRegionValue.scala:230)
	at is.hail.utils.richUtils.RichContextRDD$.writeParts(RichContextRDD.scala:41)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:107)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:105)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:248)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:248)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:59)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:59)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anon$1.hasNext(RichContextRDD.scala:68)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
	at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:166)
	at is.hail.utils.richUtils.RichContextRDD.writePartitions(RichContextRDD.scala:109)
	at is.hail.io.RichContextRDDLong$.writeRows$extension(RichContextRDDRegionValue.scala:224)
	at is.hail.rvd.RVD.write(RVD.scala:797)
	at is.hail.expr.ir.TableNativeWriter.apply(TableWriter.scala:102)
	at is.hail.expr.ir.Interpret$.run(Interpret.scala:825)
	at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)
	at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)
	at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)
	at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:14)
	at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:12)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)
	at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:354)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:338)
	at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:335)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:25)
	at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:23)
	at is.hail.utils.package$.using(package.scala:618)
	at is.hail.annotations.Region$.scoped(Region.scala:18)
	at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:23)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:247)
	at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:335)
	at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:379)
	at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:377)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:377)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:745)
java.util.IllegalFormatConversionException: d != java.lang.String
	at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
	at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2793)
	at java.util.Formatter$FormatSpecifier.print(Formatter.java:2747)
	at java.util.Formatter.format(Formatter.java:2520)
	at java.util.Formatter.format(Formatter.java:2455)
	at java.lang.String.format(String.java:2940)
	at is.hail.expr.ir.functions.UtilFunctions$.format(UtilFunctions.scala:123)
	at __C2061Compiled.__m2098format(Emit.scala)
	at __C2061Compiled.apply(Emit.scala)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply$mcJJ$sp(TableIR.scala:1876)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply(TableIR.scala:1875)
	at is.hail.expr.ir.TableMapRows$$anonfun$87$$anonfun$apply$5.apply(TableIR.scala:1875)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
	at is.hail.io.RichContextRDDLong$$anonfun$boundary$extension$2$$anon$1.next(RichContextRDDRegionValue.scala:205)
	at is.hail.io.RichContextRDDLong$$anonfun$boundary$extension$2$$anon$1.next(RichContextRDDRegionValue.scala:189)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at is.hail.io.RichContextRDDRegionValue$.writeRowsPartition(RichContextRDDRegionValue.scala:36)
	at is.hail.io.RichContextRDDLong$$anonfun$writeRows$extension$1.apply(RichContextRDDRegionValue.scala:230)
	at is.hail.io.RichContextRDDLong$$anonfun$writeRows$extension$1.apply(RichContextRDDRegionValue.scala:230)
	at is.hail.utils.richUtils.RichContextRDD$.writeParts(RichContextRDD.scala:41)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:107)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$3.apply(RichContextRDD.scala:105)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:248)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:248)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:59)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:59)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anon$1.hasNext(RichContextRDD.scala:68)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Hail version: 0.2.61-3c86d3ba497a
Error summary: IllegalFormatConversionException: d != java.lang.String
[Stage 30:>                                                      (0 + 8) / 2629]

Thanks! We’ve been doing work to make more informative error messages which don’t print these long Java stack traces that are meaningless to users. You’ve hit a case we haven’t yet covered. We should be able to fix that soon.

In the meantime, it looks like your error is coming from a use of hl.format. If you could share that line in your code, I can help debug.

Ok, thank you for that pointer.

I have the following use of format:

ht = ht.transmute(
  variant=hl.expr.functions.parse_variant(
    hl.expr.functions.format('%s:%d:%s:%s', ht.CHROM, ht.POS, ht.REF, ht.ALT),
    'GRCh38')
)

I presume there is an illegal value in one of CHROM, POS, REF or ALT.

Actually, it looks like this exception is only thrown for invalid format strings, not for bad data. If you use ht.describe(), what is the type of ht.POS?

I think you’re right — the type ht.POS wasn’t imputed as integer but as string. I’ll confirm this and try to fix it by explicitly casting to integer.

So the moral of the story was that I was reading multiple files which had slightly different headers and therefore the header was not removed in all but the first file and therefore all columns were imputed as string.

Clearly this was a problem with my data, but it was really tough to tell what was going on from the error message.

Many thanks!

Glad you figured it out!

The indecipherable error message is definitely a Hail bug. I just made a change which will be included in the next release which would have given you the error:

FatalError: HailException: Encountered invalid type for format string %s:%d:%s:%s: format specifier d does not accept type java.lang.String

That’s a significant improvement, but it still prints a Java stack trace. We now consider it a bug any time a user error like bad data results in a Java stack trace, instead of a Python trace that shows you which line of code you wrote is responsible for the error. We will fix errors in hl.format to produce Python stack traces soon.

If you encounter any more Java stack traces, please let us know!