Import VCF from dbSNP

Hi, I wished to import dbSNP latest version in Hail. The official (GRCh38) VCF is available on NCBI FTP (https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.39.gz)

Notably contig names in this VCF file are RefSeq accession IDs ( i.e. NC_000001.11 for chr1) and it also includes all alternative contigs…
Thus seeking to import this VCF as a hail matrixtable, I am using hl.import_vcf with the option contig_recoding and skip_invalid_loci

contigs_map={
    'NC_000001.11':'chr1',
    'NC_000002.12':'chr2',
    'NC_000003.12':'chr3',
    'NC_000004.12':'chr4',
    'NC_000005.10':'chr5',
    'NC_000006.12':'chr6',
    'NC_000007.14':'chr7',
    'NC_000008.11':'chr8',
    'NC_000009.12':'chr9',
    'NC_000010.11':'chr10',
    'NC_000011.10':'chr11',
    'NC_000012.12':'chr12',
    'NC_000013.11':'chr13',
    'NC_000014.9':'chr14',
    'NC_000015.10':'chr15',
    'NC_000016.10':'chr16',
    'NC_000017.11':'chr17',
    'NC_000018.10':'chr18',
    'NC_000019.10':'chr19',
    'NC_000020.11':'chr20',
    'NC_000021.9':'chr21',
    'NC_000022.11':'chr22',
    'NC_000023.11':'chrX',
    'NC_000024.10':'chrY'
}

mt = hl.import_vcf('s3://.../GCF_000001405.39.gz', 
              reference_genome='GRCh38', force_bgz=True, 
              contig_recoding=contigs_map, skip_invalid_loci=True)
mt.show()

I get a warning - but that should not be a problem, right ?

Hail: WARN: expected input file '...' to end in .vcf[.bgz, .gz]

More problematic, I also get the following error

An error was encountered:
IllegalArgumentException: requirement failed
...
Hail version: 0.2.80-4ccfae1ff293
Error summary: IllegalArgumentException: requirement failed

Any idea what might be wrong ? and/or how to trouble shoot this shortcoming

what’s the full stack trace here?

Here the error displayed in jupyter

Java stack trace

An error was encountered:
IllegalArgumentException: requirement failed

Java stack trace:
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
at is.hail.rvd.RVDPartitioner.(RVDPartitioner.scala:52)
at is.hail.rvd.RVDPartitioner.extendKeySamePartitions(RVDPartitioner.scala:141)
at is.hail.expr.ir.LoweredTableReader$$anon$2.coerce(TableIR.scala:383)
at is.hail.expr.ir.GenericTableValue.toTableStage(GenericTableValue.scala:162)
at is.hail.io.vcf.MatrixVCFReader.lower(LoadVCF.scala:1790)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:407)
at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:1199)
at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:69)
at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)

Hail version: 0.2.80-4ccfae1ff293
Error summary: IllegalArgumentException: requirement failed
Traceback (most recent call last):
File “”, line 2, in show
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/matrixtable.py”, line 2633, in show
cols = self.col_key[0].take(displayed_n_cols)
File “”, line 2, in take
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/expr/expressions/base_expression.py”, line 1005, in take
return hl.eval(e)
File “”, line 2, in eval
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py”, line 194, in eval
return eval_timed(expression)[0]
File “”, line 2, in eval_timed
File “/usr/local/lib/python3.7/site-packages/hail/typecheck/check.py”, line 577, in wrapper
return original_func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/hail/expr/expressions/expression_utils.py”, line 158, in eval_timed
(tupled_ans, timing) = Env.backend().execute(tupled_expression._ir, True)
File “/usr/local/lib/python3.7/site-packages/hail/backend/py4j_backend.py”, line 110, in execute
raise e
File “/usr/local/lib/python3.7/site-packages/hail/backend/py4j_backend.py”, line 86, in execute
result_tuple = self._jhc.backend().executeEncode(jir, stream_codec)
File “/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py”, line 1305, in call
answer, self.gateway_client, self.target_id, self.name)
File “/usr/local/lib/python3.7/site-packages/hail/backend/py4j_backend.py”, line 31, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest), error_id) from None
hail.utils.java.FatalError: IllegalArgumentException: requirement failed

Java stack trace:
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
at is.hail.rvd.RVDPartitioner.(RVDPartitioner.scala:52)
at is.hail.rvd.RVDPartitioner.extendKeySamePartitions(RVDPartitioner.scala:141)
at is.hail.expr.ir.LoweredTableReader$$anon$2.coerce(TableIR.scala:383)
at is.hail.expr.ir.GenericTableValue.toTableStage(GenericTableValue.scala:162)
at is.hail.io.vcf.MatrixVCFReader.lower(LoadVCF.scala:1790)
at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:407)
at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:1199)
at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:69)
at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:381)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:417)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:47)
at is.hail.utils.package$.using(package.scala:638)
at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:46)
at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:275)
at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:414)
at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:413)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)

Hail version: 0.2.80-4ccfae1ff293
Error summary: IllegalArgumentException: requirement failed

I have a fog memory of people hitting this error message from a Hail bug a while back – and the version of Hail you’re using is from a while back :slight_smile:

are you able to update to latest build and try again?