Can't find Hail export file location

Hello,

I’ve used the All Of Us NIH Jupyter notebook to generate some vat tables. When I run the code to export it to the directory the Jupyter notebook is in, it looks like the job completes and the file should be in there, but I don’t see it there. Any tips? Here’s the code I’m using to export it and there is a print-out that says the name of that directory with the new file in it. It suggests that it should be in there.

I used vat.export('directory'/file_name.tsv.gz) to export it to that directory once I had put the vat table together.

1 Like

When using Apache Spark (the library on which Hail depends), a filename with no scheme, like /foo/bar/filename.tsv.gz, is in HDFS. You should basically never use HDFS. Instead, export your file to GCS:

export('gs://yourbucket/foo/bar/filename.tsv.bgz')

and then download to wherever you like. You can use Hail to download it to wherever your Notebook is:

hl.hadoop_copy('gs://yourbucket/foo/bar/filename.tsv.bgz',
               'file:///foo/bar/filename.tsv.bgz')

Or use gsutil:

gsutil cp gs://... /foo/bar...
1 Like

Okay gotcha, thank you. I tried that and the error message I got was an error code 403 that I don’t have access to the bucket that we imported the data from, which makes sense since it’s protected NIH data. How can I export the data to a bucket I do have access to?

Scratch that, there’s a provided workspace bucket to export files to the Jupyter notebook environment. Running that now so hope it works. Sorry for the back and forth, but thanks for your help!

1 Like