Can't find Hail export file location

julienw · November 11, 2022, 5:21pm

Hello,

I’ve used the All Of Us NIH Jupyter notebook to generate some vat tables. When I run the code to export it to the directory the Jupyter notebook is in, it looks like the job completes and the file should be in there, but I don’t see it there. Any tips? Here’s the code I’m using to export it and there is a print-out that says the name of that directory with the new file in it. It suggests that it should be in there.

I used vat.export('directory'/file_name.tsv.gz) to export it to that directory once I had put the vat table together.

danking · November 14, 2022, 4:15pm

When using Apache Spark (the library on which Hail depends), a filename with no scheme, like /foo/bar/filename.tsv.gz, is in HDFS. You should basically never use HDFS. Instead, export your file to GCS:

export('gs://yourbucket/foo/bar/filename.tsv.bgz')

and then download to wherever you like. You can use Hail to download it to wherever your Notebook is:

hl.hadoop_copy('gs://yourbucket/foo/bar/filename.tsv.bgz',
               'file:///foo/bar/filename.tsv.bgz')

Or use gsutil:

gsutil cp gs://... /foo/bar...

julienw · November 15, 2022, 4:34am

Okay gotcha, thank you. I tried that and the error message I got was an error code 403 that I don’t have access to the bucket that we imported the data from, which makes sense since it’s protected NIH data. How can I export the data to a bucket I do have access to?

julienw · November 15, 2022, 4:52am

Scratch that, there’s a provided workspace bucket to export files to the Jupyter notebook environment. Running that now so hope it works. Sorry for the back and forth, but thanks for your help!

Topic		Replies	Views
MatrixTable file not being written in working directory Hail Query & hailctl	2	423	December 29, 2022
FileNotFoundException when I tried to export after I upgraded hail Hail Query & hailctl	9	695	July 8, 2020
Hail can't export tsv.bgz Hail Query & hailctl	1	374	September 20, 2023
Fatal error at row table export after sample_rows/cols with Query-on-Batch in jupyter Hail Query & hailctl	7	347	May 4, 2022
Can't export to plink/bgen/vcf on DNAnexus Hail Query & hailctl	5	640	September 28, 2022

Can't find Hail export file location

Related topics