No FileSystem for scheme "gs"

Hi all,
I know that this topic has been raised already several times but I am still facing the issue in the title.
I am interested in reading the files from the gnomAD project.
Hail version:

 pip show hail
Name: hail
Version: 0.2.130
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: 
Location: /home/aih/antonio.nappi/miniconda3/envs/pytorch113/lib/python3.10/site-packages
Requires: aiodns, aiohttp, avro, azure-identity, azure-mgmt-storage, azure-storage-blob, bokeh, boto3, botocore, decorator, Deprecated, dill, frozenlist, google-auth, google-auth-oauthlib, humanize, janus, jproperties, nest-asyncio, numpy, orjson, pandas, parsimonious, plotly, pyspark, python-json-logger, pyyaml, requests, rich, scipy, sortedcontainers, tabulate, typer, uvloop

command that I am executing:

import hail as hl
mt = hl.read_matrix_table(
    'gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht'
)
mt = mt.head(100_000)
sites = mt.collect()

I get the following error

Hail version: 0.2.130-bea04d9c79b5
Error summary: UnsupportedFileSystemException: No FileSystem for scheme "gs"

I have installed hail into a singularity container, maybe this is the problem?

Hello @Newbie, have you installed the GCS Connector? That dependency is necessary for Spark (and by extension Hail in your setup) to read from GCS, so you’ll need to install that in your singularity container to get your script working.

Hi @danielgoldstein thanks for your answer :slight_smile: unfortunately I don’t think that the error is there since I should have installed in the singularity container. I post here the config file for the container.

Bootstrap: docker
  2 From: continuumio/miniconda3
  3
  4 %environment
  5     # Set DEBIAN_FRONTEND to noninteractive to avoid prompts
  6     export DEBIAN_FRONTEND=noninteractive
  7     # Activate Conda environment
  8     source activate gb
  9
 10 %post
 11     export DEBIAN_FRONTEND=noninteractive
 12
 13     # Install dependencies required by PostgreSQL
 14     apt-get update && apt-get install -y --no-install-recommends \
 15         wget ca-certificates \
 16         libreadline-dev zlib1g-dev \
 17         libssl-dev build-essential \
 18         libxml2-dev libxslt1-dev \
 19         libffi-dev liblzma-dev \
 20         locales  libpq-dev \
 21         openjdk-11-jdk curl tar vim nano \
 22         g++ \
 23         python3 python3-pip \
 24         libopenblas-dev liblapack-dev \
 25         liblz4-dev \
 26         python3-pip \
 27         git \
 28         build-essential \
 29         maven
 30
 31     # Install necessary Python packages
 32     pip install numpy scipy
 33
 34     # Set Java
 35     echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> /etc/profile
 36     echo 'export PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile
 37     # Download and extract Hadoop
 38     wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
 39     tar -xzvf hadoop-3.3.6.tar.gz -C /opt
 40     mv /opt/hadoop-3.3.6 /opt/hadoop
 41     # Download the Google Cloud Storage connector
 42     wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar -P /opt/hadoop/share/hadoop/common
 43
 44     # Clean up
 45     rm hadoop-3.3.6.tar.gz
 46
 47     # Create Hadoop configuration directory
 48     mkdir -p /opt/hadoop/etc/hadoop
 # Create Hadoop configuration directory
 48     mkdir -p /opt/hadoop/etc/hadoop
 49
 50     # Add GCS connector configuration to core-site.xml
 51     echo '<configuration>
 52             <property>
 53                 <name>fs.gs.impl</name>
 54                 <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
 55             </property>
 56             <property>
 57                 <name>fs.AbstractFileSystem.gs.impl</name>
 58                 <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
 59             </property>
 60             <property>
 61                 <name>google.cloud.auth.service.account.enable</name>
 62                 <value>false</value>
 63             </property>
 64             <property>
 65                 <name>google.cloud.auth.application.default</name>
 66                 <value>true</value>
 67             </property>
 68           </configuration>' > /opt/hadoop/etc/hadoop/core-site.xml
 69     # Download and extract Spark 3.3.4
 70     wget https://archive.apache.org/dist/spark/spark-3.3.4/spark-3.3.4-bin-hadoop3.tgz
 71     tar -xzvf spark-3.3.4-bin-hadoop3.tgz -C /opt
 72     mv /opt/spark-3.3.4-bin-hadoop3 /opt/spark
 73     rm spark-3.3.4-bin-hadoop3.tgz
 74      # Install Google Cloud SDK
 75     wget https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-367.0.0-linux-x86_64.tar.gz
 76     tar -xf google-cloud-sdk-367.0.0-linux-x86_64.tar.gz
 77     ./google-cloud-sdk/install.sh -q
 78     ./google-cloud-sdk/bin/gcloud components update -q
     # Clean up
 81     rm google-cloud-sdk-367.0.0-linux-x86_64.tar.gz
 82
 83     # Initialize gcloud (you may need to do this manually after installation or use a service account key)
 84     echo 'source /google-cloud-sdk/path.bash.inc' >> ~/.bashrc
 85     # Download pre-built Hail JAR
 86     git clone https://github.com/hail-is/hail.git
 87     cd hail/hail
 88     make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.18 SPARK_VERSION=3.3.2
 89     # Install PostgreSQL and its dependencies
 90     apt-get install -y --no-install-recommends \
 91         postgresql postgresql-contrib
 92
 93     # Ensure all packages are configured properly
 94     dpkg --configure -a
 95
 96     # Remove existing PostgreSQL data directory if it exists
 97     if [ -d /var/lib/postgresql/13/main ]; then
 98         rm -rf /var/lib/postgresql/13/main
 99     fi
100
101     # Create PostgreSQL data directory and set ownership to postgres
102     mkdir -p /var/lib/postgresql/13/main
103     chown -R postgres:postgres /var/lib/postgresql
104
105     # Initialize PostgreSQL database cluster as postgres user
106     su postgres -c "/usr/lib/postgresql/13/bin/initdb -D /var/lib/postgresql/13/main"
107
108     # Start PostgreSQL service to set password and keep it running
109     su postgres -c "/usr/lib/postgresql/13/bin/pg_ctl -D /var/lib/postgresql/13/main -l /var/log/postgresql/logfile start"
110     su postgres -c "psql -c \"ALTER USER postgres PASSWORD 'password';\""
111
112     # Setting up Conda environment
113     echo "Setting up Conda environment"
114     conda update -n base -c defaults conda
115     conda env create -f /requirements.yml
116
117     # Clean up Conda cache to free up space
118     conda clean --all --yes
119     rm -rf /opt/conda/pkgs/*
120
121     echo "Conda setup complete."
122 %environment
123     export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
124     export PATH=$JAVA_HOME/bin:$PATH
125     export PATH=/usr/local/bin:$PATH
126     export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
127     export HADOOP_HOME=/opt/hadoop
128     export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
     export CLASSPATH=$HADOOP_HOME/share/hadoop/common/gcs-connector-hadoop3-latest.jar:$CLASSPATH
130     export SPARK_CLASSPATH=$HADOOP_HOME/share/hadoop/common/gcs-connector-hadoop3-latest.jar
131     export SPARK_HOME=/opt/spark
132     export HAIL_HOME=/opt/hail
133
134 %files
135     /lustre/groups/casale/code/users/antonio.nappi/genebass_results/requirements.yml /requirements.yml
136 %runscript
137     # Start PostgreSQL service as postgres user and activate Conda environment
138     su postgres -c "/usr/lib/postgresql/13/bin/pg_ctl -D /var/lib/postgresql/13/main -l /var/log/postgresql/logfile start"
139     source activate gb
140     echo "Starting container with PostgreSQL and Conda environment"
141     exec "$@"