No FileSystem for scheme "gs" - gnomad

Hi there,

I’m struggling hard with hail, gnomad pipeline for ancestry estimation.

When I run following code
with hl.hadoop_open(gnomad_v3_onnx_rf, “rb”) as f:
v3_onx_fit = onnx.load(f)

I run into error on python jupyter
Hail version: 0.2.132-678e1f52b999
Error summary: UnsupportedFileSystemException: No FileSystem for scheme “gs”

I download gcs jar gcs-connector-hadoop2-latest.jar file and put into a folder. I edit spark jar path but it doesn’t help

jar_spark_path=“/sc/arion/projects/gapslab/team_folders/sanjeev/projects/TARCC/ancestry_estimation/pip_python3.12”
os.environ[“SPARK_CLASSPATH”] = jar_spark_path

Can you help?

You should put the jar in $SPARK_HOME/jars/, probably where pyspark is installed.

I installed pyspark using pip and put the gs connector in its jars folder. Yet I still get the same error.

This is incredibly hard to make it work.

That version of the GCS connector seems to be incompatible with the latest version of hail.

To install the GCS connector properly into your environment we recommend using a script that we provide. More information is here. The short of it is, in your environment, run:

curl -sSL https://broad.io/install-gcs-connector | python3

I restarted my kernel and I run into connection timeout error

connection_time_out.txt (9.7 KB)

You’re getting an error trying to communicate with a metadata server (that is almost certainly not running) that would provide the connector library with your google cloud credentials.

We configure the gcs connector library to use APPLICATION_DEFAULT authentication when installed via the install-gcs-connector script.

You need to set up said credentials. More information is in the Google Cloud docs.

It may be as simple as running gcloud auth application-default login in your environment. You may need to speak with your system administrator to get the best practices for authenticating to GCP in your environment.