Problem with GLIBCXX library when running hl.export_vcf

samn · August 23, 2019, 3:18pm

Hi,
At hail office hours I created, with a lot of Daniel Kings help, a hail conda environment. When trying to run hl.export_vcf(mt, output_vcf_name) to create a vcf I am getting the below.

ERROR: dlopen("/tmp/libhail2911221901527767610.so"): /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /tmp/libhail2911221901527767610.so)

FATAL: caught exception java.lang.UnsatisfiedLinkError: /tmp/libhail2911221901527767610.so: /lib64/libc.so.6: version GLIBC_2.14' not found (required by /tmp/libhail2911221901527767610.so) java.lang.UnsatisfiedLinkError: /tmp/libhail2911221901527767610.so: /lib64/libc.so.6: versionGLIBC_2.14’ not found (required by /tmp/libhail2911221901527767610.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at is.hail.nativecode.NativeCode.(NativeCode.java:30)
at is.hail.nativecode.NativeBase.(NativeBase.scala:20)
at is.hail.annotations.Region.(Region.scala:180)
at is.hail.annotations.Region$.apply(Region.scala:16)
at is.hail.annotations.Region$.scoped(Region.scala:18)
at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)
at is.hail.backend.Backend.execute(Backend.scala:86)
at is.hail.backend.Backend.executeJSON(Backend.scala:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)

I am running this with gcc version 5.1.1 as recommended.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0/libexec/gcc/x86_64-redhat-linux/5.1.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: /tmp/redhat_6_x86_64/build/tmp/gcc-5.1.0/configure --prefix=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0 --enable-lto --with-cloog --enable-plugins --enable-languages=c,c++,objc,obj-c++,fortran --build=x86_64-redhat-linux --disable-multilib --with-gmp=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0 --with-mpfr=/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.1.0
Thread model: posix
gcc version 5.1.0 (GCC)

Could someone help me out in getting this run?
Thanks,
Sam

cseed · August 23, 2019, 3:43pm

version `GLIBC_2.14’ not found

You’ve got the right version of libstdc++ (the previous error), but now your version of libc is out of date.

What base image are you using?

Our build requires at least 2.14. To get the libc version, you can just run the libc.so directly (in your case, /lib64/libc.so.6), on my machine, which has 2.27:

# /lib/x86_64-linux-gnu/libc.so.6 --version
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 7.3.0.
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

danking · August 23, 2019, 4:51pm

In particular, the Wikipedia article on the GNU C Library has the version history which may help you find an appropriate version of a distribution of GNU/Linux:

samn · August 26, 2019, 7:02pm

Sorry to go backwards, but that was running as the picard user, which I don’t want to upgrade or change since that is one of our production accounts.

As me I have the outdated gcc version 4.4.
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: …/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)

I am trying to updated gcc to version 5 in my conda environment, but having trouble doing so. @danking, do you know what I should be entering in at the command line in my conda environment to get updated gcc version?

danking · August 26, 2019, 7:22pm

The anaconda gcc version is only 4.8.5 https://anaconda.org/anaconda/gcc.

Can you remind me of your environment? Are we in a docker container? Are we allowed to change the docker image?

Recent versions of gcc should be available in whatever package manager you have available in your distribution.

samn · August 26, 2019, 7:49pm

Hi @danking,
This is running on local host in conda environment where we installed hail and not in a docker container.

We are subsetting very small vcfs so it wasn’t worth creating a docker image because it would of been difficult to install and configure the Google Storage Hadoop connector.
Thanks,
Sam

danking · August 26, 2019, 7:55pm

Alright, installing gcc should simply require searching for the right package in your package manager, yum, apt-get, apk, whichever your linux distro provides.

But none of this will work unless you /lib/x86_64-linux-gnu/libc.so.6 --version does not indicate you have libc 2.14. You need Ubuntu 12.04, RHEL 7, Debian 8, Fedora 20, etc. (see the wikipedia link from above). You can’t change libc, that’s tightly coupled to your distribution.

samn · August 27, 2019, 3:35pm

Thanks @danking, I think I finally got the environment correct. I am now getting the below. Does that look familiar to you?

Traceback (most recent call last):
  File "./subset_vcf.py", line 27, in <module>
    hl.export_vcf(mt, output_vcf_name)
  File "</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>", line 2, in export_vcf
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py", line 513, in export_vcf
    Env.backend().execute(MatrixWrite(dataset._mir, writer))
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py", line 108, in execute
    result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py", line 225, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
hail.utils.java.FatalError: HailException: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): is.hail.utils.HailException: Pathways_4_samples_standard_germline_WES_08022019.vcf.gz: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
  offending line: chr1	12807	.	C	T	66.61	VQSRTrancheSNP99.95to100.00	AC=1;AF=0...
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:20)
	at is.hail.utils.package$.fatal(package.scala:74)
	at is.hail.utils.Context.wrapException(Context.scala:19)
	at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1300)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
	at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:1245)
	at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:1244)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:448)
	at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$32.apply(ContextRDD.scala:448)
	at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:218)
	at is.hail.sparkextras.ContextRDD$$anonfun$run$1$$anonfun$apply$8.apply(ContextRDD.scala:218)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: is.hail.utils.HailException: Invalid locus 'chr1:12807' found. Contig 'chr1' is not in the reference genome 'GRCh37'.
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
	at is.hail.utils.package$.fatal(package.scala:74)
	at is.hail.variant.ReferenceGenome.checkLocus(ReferenceGenome.scala:249)
	at is.hail.io.vcf.VCFLine$$anonfun$parseAddVariant$2.apply(LoadVCF.scala:340)
	at is.hail.io.vcf.VCFLine$$anonfun$parseAddVariant$2.apply(LoadVCF.scala:340)
	at scala.Option.foreach(Option.scala:257)
	at is.hail.io.vcf.VCFLine.parseAddVariant(LoadVCF.scala:340)
	at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1274)
	... 32 more

cdv · August 27, 2019, 4:01pm

Here, you’re reading from a VCF, but when using import_vcf you’re using not passing the appropriate reference. import_vcf takes a reference_genome argument. By default, hail uses GRCh37 as the reference. Here it needs to be GRCh38. If the entire script is GRCh38, try passing default_reference='GRCh38' to hl.init(). It will save you from needing to write reference_genome='GRCh38' everywhere.

samn · August 27, 2019, 5:17pm

Thanks @cdv! That got me a little bit further. It is now failing with the below. Have any ideas for this? Sorry to keep going back and forth on this… I do appreciate the help.

[Stage 1:> (0 + 1) / 1]2019-08-27 17:13:05 Hail: INFO: Coerced sorted dataset
[Stage 2:> (0 + 1) / 1]Traceback (most recent call last):
File “./subset_vcf.py”, line 27, in
hl.export_vcf(mt, output_vcf_name)
File “</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>”, line 2, in export_vcf
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py”, line 513, in export_vcf
Env.backend().execute(MatrixWrite(dataset._mir, writer))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py”, line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py”, line 225, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: NumberFormatException: For input string: “nul”
Java stack trace:

org.apache.spark.SparkException: Job aborted.

at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:1013)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1012)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:970)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:968)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1517)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1505)

at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1505)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1505)

at is.hail.utils.richUtils.RichRDD$.writeTable$extension(RichRDD.scala:66)

at is.hail.io.vcf.ExportVCF$.apply(ExportVCF.scala:474)

at is.hail.expr.ir.MatrixVCFWriter.apply(MatrixWriter.scala:48)

at is.hail.expr.ir.WrappedMatrixWriter.apply(MatrixWriter.scala:24)

at is.hail.expr.ir.Interpret$.apply(Interpret.scala:731)

at is.hail.expr.ir.Interpret$.apply(Interpret.scala:91)

at is.hail.expr.ir.CompileAndEvaluate$$anonfun$1.apply(CompileAndEvaluate.scala:33)

at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:24)

at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:33)

at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)

at is.hail.backend.Backend$$anonfun$execute$1.apply(Backend.scala:86)

at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:8)

at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:7)

at is.hail.utils.package$.using(package.scala:596)

at is.hail.annotations.Region$.scoped(Region.scala:18)

at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:7)

at is.hail.backend.Backend.execute(Backend.scala:86)

at is.hail.backend.Backend.executeJSON(Backend.scala:92)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows

at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

at org.apache.spark.scheduler.Task.run(Task.scala:121)

at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: is.hail.utils.HailException: Pathways_4_samples_standard_germline_WES_08022019.vcf.gz: caught java.lang.NumberFormatException: For input string: “nul”

offending line: chr1 13273 . G C 79.24 VQSRTrancheSNP99.95to100.00 AC=2;AF=0…

at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:20)

at is.hail.utils.package$.fatal(package.scala:74)

at is.hail.utils.Context.wrapException(Context.scala:23)

at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1300)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)

at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:220)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:128)

at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)

at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)

… 10 more

Caused by: java.lang.NumberFormatException: For input string: “nul”

at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

at java.lang.Double.parseDouble(Double.java:538)

at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:285)

at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)

at is.hail.io.vcf.VCFLine.infoToDouble(LoadVCF.scala:739)

at is.hail.io.vcf.VCFLine.parseDoubleInInfoArray(LoadVCF.scala:796)

at is.hail.io.vcf.VCFLine.parseDoubleInfoArrayElement(LoadVCF.scala:819)

at is.hail.io.vcf.VCFLine.parseAddInfoArrayDouble(LoadVCF.scala:875)

at is.hail.io.vcf.VCFLine.parseAddInfoField(LoadVCF.scala:906)

at is.hail.io.vcf.VCFLine.addField$1(LoadVCF.scala:925)

at is.hail.io.vcf.VCFLine.parseAddInfo(LoadVCF.scala:955)

at is.hail.io.vcf.LoadVCF$.parseLine(LoadVCF.scala:1361)

at is.hail.io.vcf.MatrixVCFReader$$anonfun$15.apply(LoadVCF.scala:1592)

at is.hail.io.vcf.MatrixVCFReader$$anonfun$15.apply(LoadVCF.scala:1592)

at is.hail.io.vcf.LoadVCF$$anonfun$parseLines$1$$anon$1.hasNext(LoadVCF.scala:1276)

… 29 more

tpoterba · August 27, 2019, 5:22pm

Looks like there are appearances of “nul” in what should be a numeric field – this is a spec violation.

You can probably fix this by using the find_replace option on import_vcf: something like find_replace=('nul', 'NA')

samn · August 27, 2019, 5:25pm

Thanks @tpoterba.

Just tried that ^ and I am getting the same error as above, except complaining about NA instead of nul.

File “./subset_vcf.py”, line 27, in
hl.export_vcf(mt, output_vcf_name)
File “</home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/decorator.py:decorator-gen-1184>”, line 2, in export_vcf
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py”, line 585, in wrapper
return original_func(*args, **kwargs)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py”, line 513, in export_vcf
Env.backend().execute(MatrixWrite(dataset._mir, writer))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/backend.py”, line 108, in execute
result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py”, line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File “/home/unix/samn/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py”, line 225, in deco
‘Error summary: %s’ % (deepest, full, hail.version, deepest)) from None
hail.utils.java.FatalError: NumberFormatException: For input string: “NA”

danking · August 27, 2019, 5:27pm

Isn’t missing supposed to be indicated with a “.” in VCF?

tpoterba · August 27, 2019, 5:59pm

yes, what Dan said! Oops.

samn · August 27, 2019, 8:39pm

That did the trick. Thank you all so much for your help!

Topic		Replies	Views
py4j.protocol.Py4JNetworkError: Answer from Java side is empty: hail, download files Hail Query & hailctl	18	5747	November 29, 2021
Local hail runtime error: `GLIBCXX_3.4.18' not found Hail Query & hailctl	2	708	February 11, 2019
java.lang.UnsatisfiedLinkError: is.hail.annotations.Region.nativeCtor() Help [0.1]	15	1056	July 19, 2018
Is there a recommend Hail 0.2 commit version? Help [0.1]	17	1611	September 13, 2018
Fatal error at get_module_dir during installation example Hail Query & hailctl	9	698	July 9, 2019

Problem with GLIBCXX library when running hl.export_vcf

Related topics