I’ve used the All Of Us NIH Jupyter notebook to generate some vat tables. When I run the code to export it to the directory the Jupyter notebook is in, it looks like the job completes and the file should be in there, but I don’t see it there. Any tips? Here’s the code I’m using to export it and there is a print-out that says the name of that directory with the new file in it. It suggests that it should be in there.
I used vat.export('directory'/file_name.tsv.gz) to export it to that directory once I had put the vat table together.
When using Apache Spark (the library on which Hail depends), a filename with no scheme, like /foo/bar/filename.tsv.gz, is in HDFS. You should basically never use HDFS. Instead, export your file to GCS:
Okay gotcha, thank you. I tried that and the error message I got was an error code 403 that I don’t have access to the bucket that we imported the data from, which makes sense since it’s protected NIH data. How can I export the data to a bucket I do have access to?
Scratch that, there’s a provided workspace bucket to export files to the Jupyter notebook environment. Running that now so hope it works. Sorry for the back and forth, but thanks for your help!