Running Hail on AWS


#1

Hi, I’m a spark newbie and trying to get started using Hail on AWS/EMR. I’ve followed many of the suggestions on this forum, and I can build Hail for Spark 2.2.0 on EMR 5.10.0 When I run pyspark from an ssh connection to the master node, I get an error when calling hl.init():

pyspark --jars hail-all-spark.jar \
--py-files hail-python.zip \
--conf spark.driver.extraClassPath=./hail-all-spark.jar \
--conf spark.executor.extraClassPath=./hail-all-spark.jar \
--conf spark.sql.files.openCostInBytes=1099511627776 \
--conf spark.sql.files.maxPartitionBytes=1099511627776 \
--conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator

...
>>> hl.init()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-58>", line 2, in init
  File "/home/hadoop/hail-python.zip/hail/typecheck/check.py", line 546, in wrapper
  File "/home/hadoop/hail-python.zip/hail/context.py", line 177, in init
  File "<decorator-gen-56>", line 2, in __init__
  File "/home/hadoop/hail-python.zip/hail/typecheck/check.py", line 546, in wrapper
  File "/home/hadoop/hail-python.zip/hail/context.py", line 65, in __init__
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:236)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)
        at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2472)
        at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2468)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2468)
        at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2557)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:85)
        at is.hail.HailContext$.configureAndCreateSparkContext(HailContext.scala:102)
        at is.hail.HailContext$.apply(HailContext.scala:225)
        at is.hail.HailContext.apply(HailContext.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

I’m guessing this means I’m somehow trying to create two spark contexts, or perhaps already have one running somewhere else (zeppelin)? Do you have suggestions? Thanks


#2

Try hl.init(sc)


#3

Do that if pyspark gives you a spark context. The rules aren’t super clear around that though


#4

That helps, I got a new error (below for reference), but I was able to get around that with an updated pyspark command (also below).

New error message

>>> import hail as hl
>>> import hail.expr.aggregators as agg
>>> hl.init(sc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-58>", line 2, in init
  File "/home/hadoop/hail-python.zip/hail/typecheck/check.py", line 546, in wrapper
  File "/home/hadoop/hail-python.zip/hail/context.py", line 177, in init
  File "<decorator-gen-56>", line 2, in __init__
  File "/home/hadoop/hail-python.zip/hail/typecheck/check.py", line 546, in wrapper
  File "/home/hadoop/hail-python.zip/hail/context.py", line 65, in __init__
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: is.hail.utils.HailException: Found problems with SparkContext configuration:
  Invalid configuration property spark.serializer: required org.apache.spark.serializer.KryoSerializer.  Found: empty parameter.
        at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
        at is.hail.utils.package$.fatal(package.scala:26)
        at is.hail.HailContext$.checkSparkConfiguration(HailContext.scala:122)
        at is.hail.HailContext$.apply(HailContext.scala:227)
        at is.hail.HailContext.apply(HailContext.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

By changing the pyspark command to include the spark.serializer property:

pyspark --jars hail-all-spark.jar \
--py-files hail-python.zip \
--conf spark.driver.extraClassPath=./hail-all-spark.jar \
--conf spark.executor.extraClassPath=./hail-all-spark.jar \
--conf spark.sql.files.openCostInBytes=1099511627776 \
--conf spark.sql.files.maxPartitionBytes=1099511627776 \
--conf spark.kryo.registrator=is.hail.kryo.HailKryoRegistrator \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer

I can now get the hail logo:

>>> import hail as hl
>>> import hail.expr.aggregators as agg
>>> hl.init(sc)
Running on Apache Spark version 2.2.0
SparkUI available at http://ip-172-31-20-18.us-west-2.compute.internal:4041
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version devel-0f2f908dccaa
NOTE: This is a beta version. Interfaces may change
  during the beta period. We recommend pulling
  the latest changes weekly.

Thanks!


#5

thanks for posting the error even though you figured it out! will certainly help others in the future


#7

Hey I am using the same configuration EMR 5.10.0, HAIL 0.1 and SPARK 2.2.0. I have set all PATH variables. I could not initiate HAIL but no errors on importing it. Any help appreciated. Thanks!
[ec2-user@ip-172-31-51-77 ~]$ echo $SPARK_HOME

/usr/lib/spark/

[ec2-user@ip-172-31-51-77 ~]$ echo $HAIL_HOME

/home/ec2-user/hail/

[ec2-user@ip-172-31-51-77 ~]$ echo $PYTHONPATH

:/home/ec2-user/hail-python.zip:/python:/python:/python/lib/py4j-0.10.4-src.zip:/home/ec2-user/hail-python.zip:/usr/lib/spark//python:/usr/lib/spark//python/lib/py4j-0.10.4-src.zip

[ec2-user@ip-172-31-51-77 ~]$ echo $SPARK_CLASSPATH

[ec2-user@ip-172-31-51-77 ~]$ export SPARK_CLASSPATH=$HAIL_HOME/build/libs/hail-all-spark.jar

[ec2-user@ip-172-31-51-77 ~]$ export SPARK_CLASSPATH=/home/ec2-user/hail-all-spark.jar

[ec2-user@ip-172-31-51-77 ~]$ cd $SPARK_CLASSPATH

-bash: cd: /home/ec2-user/hail-all-spark.jar: Not a directory

[ec2-user@ip-172-31-51-77 ~]$ echo $SPARK_HOME

/usr/lib/spark/

[ec2-user@ip-172-31-51-77 ~]$ echo $HAIL_HOME

/home/ec2-user/hail/

[ec2-user@ip-172-31-51-77 ~]$ echo $PYTHONPATH

:/home/ec2-user/hail-python.zip:/python:/python:/python/lib/py4j-0.10.4-src.zip:/home/ec2-user/hail-python.zip:/usr/lib/spark//python:/usr/lib/spark//python/lib/py4j-0.10.4-src.zip

[ec2-user@ip-172-31-51-77 ~]$ echo $SPARK_CLASSPATH

/home/ec2-user/hail-all-spark.jar

[ec2-user@ip-172-31-51-77 ~]$ python

Python 2.7.13 (default, Jan 31 2018, 00:17:36)

[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2

Type “help”, “copyright”, “credits” or “license” for more information.

>>> import hail as hl

>>> hl.init()

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

AttributeError: ‘module’ object has no attribute ‘init’

>>> hl.init(sc)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

AttributeError: ‘module’ object has no attribute ‘init’

>>>


#8

Hi Nara,
It looks like you’re using Python 2, which Hail 0.2 doesn’t support.

Is it possible you’re using 0.1? If so, upgrading to 0.2 should fix the issue.


#9

Hi Tim,

Thanks for getting back to me. I tried to install Hail 0.2. I get this error, whenever I try to do that.


#10

Can you update your gcc? it must be a really old version.


#11

If it helps, here’s the steps I had for building Hail 0.2 on AWS:

sudo yum update -y
sudo yum install g++ cmake git -y
sudo mkdir -p /etc/alternatives/jre
sudo ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.37.amzn1.x86_64/include /etc/alternatives/jre/include
#sudo ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-8.b13.39.39.amzn1.x86_64/include /etc/alternatives/jre/include
git clone https://github.com/broadinstitute/hail.git
cd hail/hail/
./gradlew -Dspark.version=2.2.0 shadowJar archiveZip

I haven’t recompiled Hail in a while, but I think I had Python installed with the appropriate modules (from Hail’s environment.yml file) prior to building.


#12

Hi Tim and Keith,

Thanks for getting back to me. I somehow made Hail 0.2 to work on Python3. Can not initiate in PySpark and can not run HailContext in python3. Please go through the attached screenshot. Do let me know, any changes. Thanks.


#13

I will post the script to run Hail 0.2 on AWS, once I am done configuring it correctly.


#14

HailContext() is an 0.1 phenomenon - take a look at the tutorial in the 0.2 docs here:

https://hail.is/docs/0.2/tutorials/01-genome-wide-association-study.html


#15

Hey, I some how made Hail 0.2 work on Spark 2.2.0. But I want to import vcf files and convert to a different format(ht or mt) and load the dataset to Elastic Search.

I am also trying to implement this script https://github.com/macarthur-lab/hail-elasticsearch-pipelines/tree/master/hail_scripts/v02 . I get the same error even when trying to run on convert_vcf_to_hail.py


#16

This is a Python use issue - SEQ1875... should be inside a string.

https://www.w3schools.com/python/python_strings.asp


#17

Thanks for quick response. Tried both by giving in a string and direct ‘file_name’. But no luck on it. Can you please help me out in import and writing it to mt or ht format. For export on to Elastic Search. Thanks.

In [3]: a="SEQ187500194.vcf.gz"                                                                                                                                                          

In [4]: l.import_vcf('a')                                                                                                                                                                
2018-12-21 16:33:50 Hail: WARN: `a' refers to no files
---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<ipython-input-4-2884bd002d6d> in <module>
----> 1 l.import_vcf('a')

<decorator-gen-1093> in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    558     def wrapper(__original_func, *args, **kwargs):
    559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 560         return __original_func(*args_, **kwargs_)
    561 
    562     return wrapper

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/methods/impex.py in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)
   1886                              reference_genome, contig_recoding, array_elements_required,
   1887                              skip_invalid_loci, force_bgz, force, _partitions)
-> 1888     return MatrixTable(MatrixRead(reader, drop_cols=drop_samples))
   1889 
   1890 @typecheck(path=sequenceof(str),

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/matrixtable.py in __init__(self, mir)
    551         self._mir = mir
    552         self._jmt = Env.hail().variant.MatrixTable(
--> 553             Env.hc()._jhc, Env.hc()._backend._to_java_ir(self._mir))
    554 
    555         self._globals = None

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/backend/backend.py in _to_java_ir(self, ir)
     28             code = r(ir)
     29             # FIXME parse should be static
---> 30             ir._jir = ir.parse(code, ir_map=r.jirs)
     31         return ir._jir
     32 

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/ir/base_ir.py in parse(self, code, ref_map, ir_map)
     94 
     95     def parse(self, code, ref_map={}, ir_map={}):
---> 96         return Env.hail().expr.ir.IRParser.parse_matrix_ir(code, ref_map, ir_map)

/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/utils/java.py in deco(*args, **kwargs)
    212             raise FatalError('%s\n\nJava stack trace:\n%s\n'
    213                              'Hail version: %s\n'
--> 214                              'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    215         except pyspark.sql.utils.CapturedException as e:
    216             raise FatalError('%s\n\nJava stack trace:\n%s\n'

FatalError: HailException: arguments refer to no files

Java stack trace:
org.json4s.package$MappingException: unknown error
	at org.json4s.Extraction$.extract(Extraction.scala:46)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

is.hail.utils.HailException: arguments refer to no files
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
	at is.hail.utils.package$.fatal(package.scala:26)
	at is.hail.io.vcf.LoadVCF$.globAllVCFs(LoadVCF.scala:651)
	at is.hail.io.vcf.MatrixVCFReader.<init>(LoadVCF.scala:981)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)





Hail version: 0.2.6-6c5e6a3d5047
Error summary: HailException: arguments refer to no files

In [5]: l.import_vcf('a.strip()')                                                                                                                                                        
2018-12-21 16:35:47 Hail: WARN: `a.strip()' refers to no files

#18

These are correct usages:

In [3]: a="SEQ187500194.vcf.gz"                                                                                                                                                          

In [4]: l.import_vcf(a)                                                                                                                                                                

Or

In [5] hl.import_vcf('SEQ187500194.vcf.gz')

Please see the docs for import_vcf regarding gzipping, though - you should either rename this file .bgz if it is truly block-gzipped, or bgzip it manually before importing to Hail:

https://hail.is/docs/0.2/methods/impex.html#hail.methods.import_vcf


#19

Hey @nara! It looks like you’re getting a bit confused by the python variable syntax.

In python x = "abc" means that on subsequent lines, x now contains the string "abc". You can get that string out of the variable by simply writing x.

You’re specifically looking for this:

a="SEQ187500194.vcf.gz"
l.import_vcf(a)

#20

Hey Dan and Tim,

Thanks for responding. I have followed the instructions. Still experiencing the same issue. The vcf file is in place as well. It fills automatically by hitting tab as well.

In [4]: mv SEQ187500194.vcf.gz SEQ187500194.vcf.bgz                                                                                                                                      

In [5]: a="SEQ187500194.vcf.bgz"                                                                                                                                                         

In [6]: l.import_vcf(a)                                                                                                                                                                  
2018-12-21 17:14:45 Hail: WARN: `SEQ187500194.vcf.bgz' refers to no files
---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<ipython-input-6-ad3ce697bf42> in <module>
----> 1 l.import_vcf(a)

<decorator-gen-1093> in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    558     def wrapper(__original_func, *args, **kwargs):
    559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 560         return __original_func(*args_, **kwargs_)
    561 
    562     return wrapper

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/methods/impex.py in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)
   1886                              reference_genome, contig_recoding, array_elements_required,
   1887                              skip_invalid_loci, force_bgz, force, _partitions)
-> 1888     return MatrixTable(MatrixRead(reader, drop_cols=drop_samples))
   1889 
   1890 @typecheck(path=sequenceof(str),

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/matrixtable.py in __init__(self, mir)
    551         self._mir = mir
    552         self._jmt = Env.hail().variant.MatrixTable(
--> 553             Env.hc()._jhc, Env.hc()._backend._to_java_ir(self._mir))
    554 
    555         self._globals = None

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/backend/backend.py in _to_java_ir(self, ir)
     28             code = r(ir)
     29             # FIXME parse should be static
---> 30             ir._jir = ir.parse(code, ir_map=r.jirs)
     31         return ir._jir
     32 

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/ir/base_ir.py in parse(self, code, ref_map, ir_map)
     94 
     95     def parse(self, code, ref_map={}, ir_map={}):
---> 96         return Env.hail().expr.ir.IRParser.parse_matrix_ir(code, ref_map, ir_map)

/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/home/hadoop/hail/hail/build/distributions/hail-python.zip/hail/utils/java.py in deco(*args, **kwargs)
    212             raise FatalError('%s\n\nJava stack trace:\n%s\n'
    213                              'Hail version: %s\n'
--> 214                              'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    215         except pyspark.sql.utils.CapturedException as e:
    216             raise FatalError('%s\n\nJava stack trace:\n%s\n'

FatalError: HailException: arguments refer to no files

Java stack trace:
org.json4s.package$MappingException: unknown error
	at org.json4s.Extraction$.extract(Extraction.scala:46)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

is.hail.utils.HailException: arguments refer to no files
	at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
	at is.hail.utils.package$.fatal(package.scala:26)
	at is.hail.io.vcf.LoadVCF$.globAllVCFs(LoadVCF.scala:651)
	at is.hail.io.vcf.MatrixVCFReader.<init>(LoadVCF.scala:981)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:967)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:908)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1030)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1046)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1045)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)





Hail version: 0.2.6-6c5e6a3d5047
Error summary: HailException: arguments refer to no files

In [7]:

#21

What system are you running Hail on?