Hail unable to see sample vcf files

In running through the ‘Getting Started’ i am trying to import the included sample.vcf into Hail’s .vds format, run: but when i run the command in the python interperter i get the error:

[cloudera@quickstart hail]$ python2.7
Python 2.7.13 (default, Mar 28 2017, 10:26:54)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

from hail import *
hc = HailContext()
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel).
hc.import_vcf(‘src/test/resources/sample.vcf’).write(‘sample.vds’)
hail: info: SparkUI: http://192.168.40.130:4040

hail: warning: `src/test/resources/sample.vcf’ refers to no files
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in import_vcf
File “/home/cloudera/hail/python/hail/java.py”, line 110, in handle_py4j
raise FatalError(msg)
hail.java.FatalError: arguments refer to no files

and i can not go beyond that without getting into more errors. Any thoughts ?

Hi Michael,
I’ve seen this kind of error before when using a system with an HDFS file system installed – this is the default file system in Spark, so src/test/resources/sample.vcf will look in the hdfs home directory.

If this is the case, try importing this file by using file:// followed by the fully clarified path to sample.vcf.

Thank you, Tim, i will try that suggestion - i hadnt even considered that

Michael

Well that got me a little further:

hc.import_vcf(‘file:///home/cloudera/hail/src/test/resources/sample.vcf’).write(‘sample.vds’)
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in import_vcf
File “/home/cloudera/hail/python/hail/java.py”, line 110, in handle_py4j
raise FatalError(msg)
hail.java.FatalError: UnsupportedClassVersionError: is/hail/io/compress/BGzipCodec : Unsupported major.minor version 52.0

Looks like you don’t have Java 8, running java -version will tell you what you’ve got.

Ack you are right, i thought i changed the JAVA_HOME to point to 1.8.0 before starting this. Thank you for pointing that out.

I have a similar issue but my java home is set. The error is a a little different:

“Hail version: 0.1-5661c35
Error summary: UnsupportedClassVersionError: is/hail/io/compress/BGzipCodec : Unsupported major.minor version 52.0”

“echo $JAVA_HOME
/opt/jdk1.8.0_131
echo $JRE_HOME
/opt/jdk1.8.0_131/jre”

Mentions this in the error “UnsupportedClassVersionError: is/hail/io/compress/BGzipCodec : Unsupported major.minor version 52.0”

Is it possible that your jar wasn’t compiled with java 8?

I can compile it again to test that theory.

Compiled it again making sure Java was set to version 8. Still have the same issue.

Hail version: 0.1-5661c35
Error summary: UnsupportedClassVersionError: is/hail/io/compress/BGzipCodec : Unsupported major.minor version 52.0

Hi @klong,

Please see this Stack Overflow post about setting the JDK version for a Spark cluster. Setting JAVA_HOME only modifies the JDK on the driver node (the node on which you’ve set JAVA_HOME). You must also set the appropriate JDK for all of the worker nodes. This can be done with a spark2-submit --conf parameter:

--conf spark.executorEnv.JAVA_HOME=/path/on/executor/to/jdk1.8

Awesome. I am passed this hurdle now. If remove the $SPARK_CLASSPATH from my env what do I set these equal?

–conf spark.executor.extraClassPath= /path/to/Spark???
–conf spark.driver.extraClassPath=???

Both should have the path to the Hail jar, I think. That should have been the old value of $SPARK_CLASSPATH

Thank you for the quick answer.