Importing hail table from remote location

Hi!

I’m wondering if it is possible to import and work with a ht that is hosted remotely with a local installation of Hail. Is this possible, or does it require working entirely on the cloud or downloading the ht?

Thanks!

Where is the hail table hosted? If it’s in Google Cloud, I think installing the Google Cloud Hadoop Connector should be enough to work with remote data locally.

https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage

1 Like

Hi John,

Thanks for the suggestion. It is on Google Cloud. I’m looking at the overview in the link, but I’m not confident how this would interface with Hail’s read_table function. Will read_table then recognize the gs:// prefix where it would normally find a local path, and look there? (Sorry I’m totally new to Hail).

It’s definitely not obvious, but the cloud storage connector installs some libraries that Hail knows how to find. However, this is insufficient, you also need to set up some google account keys.

Ben Weisburd graciously contributed a python script to both install the Google Hadoop connector and configure google account keys for you: http://broad.io/install-gcs-connector

You can either download that python file yourself and run it with python or you can just run:

curl -sSL https://broad.io/install-gcs-connector | python
1 Like

Awesome. Thanks so much for this!