Invalid character 'A' in integer literal

Hi all,

I run into this error when I tried to write out a newly annotated matrix table after I joined the dosage files to the genotype data (dosage files are from Tractor).

Hail version: 0.2.78-b17627756568
Error summary: MatrixParseError: file:/Users/Tractor/chr22-1.anc2.hapcount.txt:28-29, invalid character ‘A’ in integer literal

Hail’s lazy execution model means that this error is actually a problem with the import, not the write. Can you paste the full exception trace?

2021-11-04 14:52:14 Hail: INFO: Coerced sorted dataset
2021-11-04 14:52:15 Hail: INFO: Coerced sorted dataset

FatalError Traceback (most recent call last)
in
1 #write out the newly annotated matrix table. Will run a lot faster if we load this in again after annotating things in due to Hail processing style.
----> 2 mt.write(’/Users/balazsmurnyak/Desktop/Tractor/new_hail_matrix.mt’)

in write(self, output, overwrite, stage_locally, _codec_spec, _partitions, _checkpoint_file)

~/opt/anaconda3/lib/python3.8/site-packages/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
575 def wrapper(original_func, *args, **kwargs):
576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)
→ 577 return original_func(*args, **kwargs)
578
579 return wrapper

~/opt/anaconda3/lib/python3.8/site-packages/hail/matrixtable.py in write(self, output, overwrite, stage_locally, _codec_spec, _partitions, _checkpoint_file)
2542
2543 writer = ir.MatrixNativeWriter(output, overwrite, stage_locally, _codec_spec, _partitions, _partitions_type, _checkpoint_file)
→ 2544 Env.backend().execute(ir.MatrixWrite(self._mir, writer))
2545
2546 class _Show:

~/opt/anaconda3/lib/python3.8/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
108 raise HailUserError(message_and_trace) from None
109
→ 110 raise e

~/opt/anaconda3/lib/python3.8/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
84 # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))
85 try:
—> 86 result_tuple = self._jhc.backend().executeEncode(jir, stream_codec)
87 (result, timings) = (result_tuple._1(), result_tuple._2())
88 value = ir.typ._from_encoding(result)

~/opt/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
→ 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

~/opt/anaconda3/lib/python3.8/site-packages/hail/backend/py4j_backend.py in deco(*args, **kwargs)
27 raise FatalError(‘Error summary: %s’ % (deepest,), error_id) from None
28 else:
—> 29 raise FatalError(’%s\n\nJava stack trace:\n%s\n’
30 ‘Hail version: %s\n’
31 ‘Error summary: %s’ % (deepest, full, hail.version, deepest), error_id) from None

FatalError: MatrixParseError: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt:28-29, invalid character ‘C’ in integer literal

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 24 in stage 9.0 failed 1 times, most recent failure: Lost task 24.0 in stage 9.0 (TID 249) (10.16.191.97 executor driver): is.hail.utils.HailException: ""Error parse line 2:28-29:
File: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt
Line:
22 17070764 22:17070764:C:G C G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 …
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:15)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:15)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.io.CompiledLineParser.$anonfun$apply$1(TextMatrixReader.scala:691)
at is.hail.io.CompiledLineParser.$anonfun$apply$1$adapted(TextMatrixReader.scala:673)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at __C1468stream.apply_region5_75(Unknown Source)
at C1468stream.apply(Unknown Source)
at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:312)
at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:168)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1232)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1231)
at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: is.hail.io.MatrixParseError: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2
.5_MAF
.001_chr22-1.anc2.hapcount.txt:28-29, invalid character ‘C’ in integer literal
at __C1505text_matrix_reader.__m1511parseInt(Unknown Source)
at __C1505text_matrix_reader.apply_region8_51(Unknown Source)
at __C1505text_matrix_reader.apply(Unknown Source)
at __C1505text_matrix_reader.apply(Unknown Source)
at is.hail.io.CompiledLineParser.$anonfun$apply$1(TextMatrixReader.scala:681)
… 21 more

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
at is.hail.sparkextras.ContextRDD.crunJobWithIndex(ContextRDD.scala:238)
at is.hail.rvd.RVD$.getKeyInfo(RVD.scala:1231)
at is.hail.rvd.RVD$.makeCoercer(RVD.scala:1306)
at is.hail.rvd.RVD$.coerce(RVD.scala:1262)
at is.hail.rvd.RVD.changeKey(RVD.scala:143)
at is.hail.rvd.RVD.changeKey(RVD.scala:136)
at is.hail.backend.spark.SparkBackend.lowerDistributedSort(SparkBackend.scala:685)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:975)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:986)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:475)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:857)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:475)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:857)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:475)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:857)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:475)
at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:1333)
at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:69)
at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

is.hail.utils.HailException: ""Error parse line 2:28-29:
File: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt
Line:
22 17070764 22:17070764:C:G C G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 …
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:15)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:15)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.io.CompiledLineParser.$anonfun$apply$1(TextMatrixReader.scala:691)
at is.hail.io.CompiledLineParser.$anonfun$apply$1$adapted(TextMatrixReader.scala:673)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at __C1468stream.apply_region5_75(Unknown Source)
at __C1468stream.apply(Unknown Source)
at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:312)
at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:168)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1232)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1231)
at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

is.hail.io.MatrixParseError: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt:28-29, invalid character ‘C’ in integer literal
at __C1505text_matrix_reader.__m1511parseInt(Unknown Source)
at __C1505text_matrix_reader.apply_region8_51(Unknown Source)
at __C1505text_matrix_reader.apply(Unknown Source)
at __C1505text_matrix_reader.apply(Unknown Source)
at is.hail.io.CompiledLineParser.$anonfun$apply$1(TextMatrixReader.scala:681)
at is.hail.io.CompiledLineParser.$anonfun$apply$1$adapted(TextMatrixReader.scala:673)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at __C1468stream.apply_region5_75(Unknown Source)
at __C1468stream.apply(Unknown Source)
at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:312)
at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:168)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1232)
at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1231)
at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Hail version: 0.2.78-b17627756568
Error summary: MatrixParseError: file:/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt:28-29, invalid character ‘C’ in integer literal

This error says that one of those ‘C’ fields is trying to be read as an int. I think this is because the import command didn’t get the right set of row fields to indicate that certain fields are not matrix columns. What’s the import command you used?

I used the " hl.import_matrix_table " command

row_fields={‘CHROM’: hl.tstr, ‘POS’: hl.tint, ‘ID’: hl.tstr}
hapcounts0 = hl.import_matrix_table(’/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt.gz’,
force_bgz=True, row_fields=row_fields, row_key=[], min_partitions=32)
hapcounts0 = hapcounts0.key_rows_by().drop(‘row_id’)
hapcounts0 = hapcounts0.key_rows_by(locus=hl.locus(hapcounts0.CHROM, hapcounts0.POS))

What’s the first line of the file?
I think there are some other allele fields that need to be included in row_fields.

(at least what’s the first line up to the sample IDs)

import argparse
import hail as hl
import numpy as np
hl.init()

from hail.plot import show
from pprint import pprint
hl.plot.output_notebook()

mt = hl.import_vcf(’/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.vcf’).key_rows_by(‘locus’)

row_fields={‘CHROM’: hl.tstr, ‘POS’: hl.tint, ‘ID’: hl.tstr, ‘REF’: hl.tstr, ‘ALT’: hl.tstr}
anc0dos = hl.import_matrix_table(’/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.dosage.txt’,
force_bgz=True, row_fields=row_fields, row_key=[], min_partitions=32)
anc0dos = anc0dos.key_rows_by().drop(‘row_id’)
anc0dos = anc0dos.key_rows_by(locus=hl.locus(anc0dos.CHROM, anc0dos.POS))

row_fields={‘CHROM’: hl.tstr, ‘POS’: hl.tint, ‘ID’: hl.tstr}
hapcounts0 = hl.import_matrix_table(’/Users/balazsmurnyak/Desktop/Tractor/Oall_GT_R2_.5_MAF_.001_chr22-1.anc2.hapcount.txt’,
force_bgz=True, row_fields=row_fields, row_key=[], min_partitions=32)
hapcounts0 = hapcounts0.key_rows_by().drop(‘row_id’)
hapcounts0 = hapcounts0.key_rows_by(locus=hl.locus(hapcounts0.CHROM, hapcounts0.POS))

mt = mt.annotate_entries(anc0dos = anc0dos[mt.locus, mt.s], hapcounts0 = hapcounts0[mt.locus, mt.s])
mt.write(’/Users/balazsmurnyak/Desktop/Tractor/new_hail_matrix.mt’)

Don’t you need the REF and ALT fields for the second file?

Yes, exactly! Now it works. Thanks for your help!