Continuing my adventures from earlier I am now successfully submitting Spark jobs from my Docker container to DataProc. Now I’m attempting to run Hail commands in Pyspark. Not sure if the problem I seeing is a result of my unusual setup.
I’m trying to load a local (ok, Docker-mounted) VDS file that I have copied from a previous successful Hail run, but I’m getting a File Not Found error.
>>> from hail import * >>> hc = HailContext(sc) hail: info: SparkUI: http://10.128.0.6:4040 >>> count = hc.read('file:/home/joel/work/1kg.vds').count() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<decorator-gen-153>", line 2, in read File "/hail/python/hail/java.py", line 104, in handle_py4j raise FatalError(msg) hail.java.FatalError: FileNotFoundException: File file:/home/joel/work/1kg.vds/rdd.parquet/part-r-00000-4b3d3bac-2697-423b-8283-07af77b46a72.snappy.parquet does not exist
However I do see the file
Do you know why it might not see this file? Thanks.
Also, do I have the syntax right for local files? I also tried
hc.read('/home/joel/work/1kg.vds').count() but I got a different error
hail.java.FatalError: arguments refer to no files instead