Problem with GLIBCXX library when running hl.export_vcf

Hi,
At hail office hours I created, with a lot of Daniel Kings help, a hail conda environment. When trying to run hl.export_vcf(mt, output_vcf_name) to create a vcf I am getting the below.

ERROR: dlopen("/tmp/libhail2911221901527767610.so"): /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /tmp/libhail2911221901527767610.so)

FATAL: caught exception java.lang.UnsatisfiedLinkError: /tmp/libhail2911221901527767610.so: /lib64/libc.so.6: version GLIBC_2.14' not found (required by /tmp/libhail2911221901527767610.so) java.lang.UnsatisfiedLinkError: /tmp/libhail2911221901527767610.so: /lib64/libc.so.6: versionGLIBC_2.14’ not found (required by /tmp/libhail2911221901527767610.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at is.hail.nativecode.NativeCode.(NativeCode.java:30)
at is.hail.nativecode.NativeBase.(NativeBase.scala:20)
at is.hail.annotations.Region.(Region.scala:180)
at is.hail.annotations.Region$.apply(Region.scala:16)
at is.hail.annotations.Region$.scoped(Region.scala:18)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)
at is.hail.backend.Backend.execute(Backend.scala:86)
at is.hail.backend.Backend.executeJSON(Backend.scala:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)

I am running this with gcc version 5.1.1 as recommended.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0/libexec/gcc/x86_64-redhat-linux/5.1.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: /tmp/redhat_6_x86_64/build/tmp/gcc-5.1.0/configure --prefix=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0 --enable-lto --with-cloog --enable-plugins --enable-languages=c,c++,objc,obj-c++,fortran --build=x86_64-redhat-linux --disable-multilib --with-gmp=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0 --with-mpfr=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0
Thread model: posix
gcc version 5.1.0 (GCC)  

Could someone help me out in getting this run?
Thanks,
Sam

version `GLIBC_2.14’ not found

You’ve got the right version of libstdc++ (the previous error), but now your version of libc is out of date.

What base image are you using?

Our build requires at least 2.14. To get the libc version, you can just run the libc.so directly (in your case, /lib64/libc.so.6), on my machine, which has 2.27:

# /lib/x86_64-linux-gnu/libc.so.6 --version
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.3.0.
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

In particular, the Wikipedia article on the GNU C Library has the version history which may help you find an appropriate version of a distribution of GNU/Linux:

Sorry to go backwards, but that was running as the picard user, which I don’t want to upgrade or change since that is one of our production accounts.

As me I have the outdated gcc version 4.4.
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: …/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)

I am trying to updated gcc to version 5 in my conda environment, but having trouble doing so. @danking, do you know what I should be entering in at the command line in my conda environment to get updated gcc version?

The anaconda gcc version is only 4.8.5 https://anaconda.org/anaconda/gcc.

Can you remind me of your environment? Are we in a docker container? Are we allowed to change the docker image?

Recent versions of gcc should be available in whatever package manager you have available in your distribution.

Hi @danking,
This is running on local host in conda environment where we installed hail and not in a docker container.

We are subsetting very small vcfs so it wasn’t worth creating a docker image because it would of been difficult to install and configure the Google Storage Hadoop connector.
Thanks,
Sam

Alright, installing gcc should simply require searching for the right package in your package manager, yum, apt-get, apk, whichever your linux distro provides.

But none of this will work unless you /lib/x86_64-linux-gnu/libc.so.6 --version does not indicate you have libc 2.14. You need Ubuntu 12.04, RHEL 7, Debian 8, Fedora 20, etc. (see the wikipedia link from above). You can’t change libc, that’s tightly coupled to your distribution.

Thanks @danking, I think I finally got the environment correct. I am now getting the below. Does that look familiar to you?

Traceback (most recent call last):
  File "./subset_vcf.py", line 27, in <module>
    hl.export_vcf(mt, output_vcf_name)
  File "</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>", line 2, in export_vcf
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py", line 513, in export_vcf
    Env.backend().execute(MatrixWrite(dataset._mir, writer))
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py", line 108, in execute
    result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py", line 225, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
hail.utils.java.FatalError: HailException: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): is.hail.utils.HailException: Pathways_4_samples_standard_germline_WES_08022019.vcf.gz: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
  offending line: chr1	12807	.	C	T	66.61	VQSRTrancheSNP99.95to100.00	AC=1;AF=0...
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:20)
	at is.hail.utils.package$.fatal(package.scala:74)
	at is.hail.utils.Context.wrapException(Context.scala:19)
	at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1300)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:1245)
	at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:1244)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:448)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:448)
	at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:218)
	at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:218)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: is.hail.utils.HailException: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
	at is.hail.utils.package$.fatal(package.scala:74)
	at is.hail.variant.ReferenceGenome.checkLocus(ReferenceGenome.scala:249)
	at is.hail.io.vcf.VCFLine$$anonfun$parseAddVariant$2.apply(LoadVCF.scala:340)
	at is.hail.io.vcf.VCFLine$$anonfun$parseAddVariant$2.apply(LoadVCF.scala:340)
	at scala.Option.foreach(Option.scala:257)
	at is.hail.io.vcf.VCFLine.parseAddVariant(LoadVCF.scala:340)
	at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1274)
	... 32 more

Here, you’re reading from a VCF, but when using import_vcf you’re using not passing the appropriate reference. import_vcf takes a reference_genome argument. By default, hail uses GRCh37 as the reference. Here it needs to be GRCh38. If the entire script is GRCh38, try passing default_reference='GRCh38' to hl.init(). It will save you from needing to write reference_genome='GRCh38' everywhere.

1 Like

Thanks @cdv! That got me a little bit further. It is now failing with the below. Have any ideas for this? Sorry to keep going back and forth on this… I do appreciate the help.

[Stage 1:> (0 + 1) / 1]2019-08-27 17:13:05 Hail: INFO: Coerced sorted dataset
[Stage 2:> (0 + 1) / 1]Traceback (most recent call last):
File “./subset_vcf.py”, line 27, in
hl.export_vcf(mt, output_vcf_name)
File “</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>”, line 2, in export_vcf
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py”, line 513, in export_vcf
Env.backend().execute(MatrixWrite(dataset._mir, writer))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py”, line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py”, line 225, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: NumberFormatException: For input string: “nul”
Java stack trace:

org.apache.spark.SparkException: Job aborted.

at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1012)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:970)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1517)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1505)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1505)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1505)

at is.hail.utils.richUtils.RichRDD$.writeTable$extension(RichRDD.scala:66)

at is.hail.io.vcf.ExportVCF$.apply(ExportVCF.scala:474)

at is.hail.expr.ir.MatrixVCFWriter.apply(MatrixWriter.scala:48)

at is.hail.expr.ir.WrappedMatrixWriter.apply(MatrixWriter.scala:24)

at is.hail.expr.ir.Interpret$.apply(Interpret.scala:731)

at is.hail.expr.ir.Interpret$.apply(Interpret.scala:91)

at is.hail.expr.ir.CompileAndEvaluate$$anonfun$1.apply(CompileAndEvaluate.scala:33)

at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:24)

at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:33)

at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)

at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)

at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:8)

at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:7)

at is.hail.utils.package$.using(package.scala:596)

at is.hail.annotations.Region$.scoped(Region.scala:18)

at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)

at is.hail.backend.Backend.execute(Backend.scala:86)

at is.hail.backend.Backend.executeJSON(Backend.scala:92)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows

at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

at org.apache.spark.scheduler.Task.run(Task.scala:121)

at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: is.hail.utils.HailException: Pathways_4_samples_standard_germline_WES_08022019.vcf.gz: caught java.lang.NumberFormatException: For input string: “nul”

offending line: chr1 13273 . G C 79.24 VQSRTrancheSNP99.95to100.00 AC=2;AF=0…

at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:20)

at is.hail.utils.package$.fatal(package.scala:74)

at is.hail.utils.Context.wrapException(Context.scala:23)

at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1300)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:220)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:128)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)

at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)

… 10 more

Caused by: java.lang.NumberFormatException: For input string: “nul”

at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

at java.lang.Double.parseDouble(Double.java:538)

at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:285)

at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)

at is.hail.io.vcf.VCFLine.infoToDouble(LoadVCF.scala:739)

at is.hail.io.vcf.VCFLine.parseDoubleInInfoArray(LoadVCF.scala:796)

at is.hail.io.vcf.VCFLine.parseDoubleInfoArrayElement(LoadVCF.scala:819)

at is.hail.io.vcf.VCFLine.parseAddInfoArrayDouble(LoadVCF.scala:875)

at is.hail.io.vcf.VCFLine.parseAddInfoField(LoadVCF.scala:906)

at is.hail.io.vcf.VCFLine.addField$1(LoadVCF.scala:925)

at is.hail.io.vcf.VCFLine.parseAddInfo(LoadVCF.scala:955)

at is.hail.io.vcf.LoadVCF$.parseLine(LoadVCF.scala:1361)

at is.hail.io.vcf.MatrixVCFReader$$anonfun$15.apply(LoadVCF.scala:1592)

at is.hail.io.vcf.MatrixVCFReader$$anonfun$15.apply(LoadVCF.scala:1592)

at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1276)

… 29 more

Looks like there are appearances of “nul” in what should be a numeric field – this is a spec violation.

You can probably fix this by using the find_replace option on import_vcf: something like find_replace=('nul', 'NA')

Thanks @tpoterba.

Just tried that ^ and I am getting the same error as above, except complaining about NA instead of nul.

File “./subset_vcf.py”, line 27, in
hl.export_vcf(mt, output_vcf_name)
File “</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>”, line 2, in export_vcf
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py”, line 513, in export_vcf
Env.backend().execute(MatrixWrite(dataset._mir, writer))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py”, line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py”, line 225, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: NumberFormatException: For input string: “NA”

Isn’t missing supposed to be indicated with a “.” in VCF?

yes, what Dan said! Oops.

That did the trick. Thank you all so much for your help!