Error summary: UnsupportedFileSystemException: No FileSystem for scheme "s3"

Hi all,

I am trying to access the data from the Pan UKBB (Hail Format | Pan UKBB). I have followed their steps, but I have an error when hail tries to access the s3 data. I have googled the error, but I have not been able to find a solution. I asked the Pan UKBB team and they have redirected me to this site.
I would like to access the data from my local machine. Until now, I have been able to access files from this s3 bucket with boto3, so the issue is likely related to the hail configuration. This is my code:

>>> from ukbb_pan_ancestry import *
>>> import hail as hl
>>> ht_idx = hl.read_table('s3://pan-ukb-us-east-1/ld_release/UKBB.EUR.ldadj.variant.ht')
Initializing Hail with default parameters...                                                         
2021-06-22 08:59:05 WARN  Utils:69 - Your hostname, ws112610 resolves to a loopback address: 127.0.1.1; using 172.22.3.213 instead (on interface enp0s25)
2021-06-22 08:59:05 WARN  Utils:69 - Set SPARK_LOCAL_IP if you need to bind to another address
2021-06-22 08:59:06 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".                                                                 
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2021-06-22 08:59:06 WARN  Hail:43 - This Hail JAR was compiled for Spark 3.1.1, running with Spark 3.1.2.
  Compatibility is not guaranteed.
Running on Apache Spark version 3.1.2
SparkUI available at http://ws112610.cm.upf.edu:4040
Welcome to                                       
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.69-6d2bd28a8849
LOGGING: writing to /home/SHARED/PROJECTS/Obesity_analysis/hail-20210622-0859-0.2.69-6d2bd28a8849.log
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-1429>", line 2, in read_table
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/methods/impex.py", line 2457, in read_table
    for rg_config in Env.backend().load_references_from_dataset(path):
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 326, in load_references_from_dataset
    return json.loads(Env.hail().variant.ReferenceGenome.fromHailDataset(self.fs._jfs, path))
  File "/home/carlos/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 30, in deco
    raise FatalError('%s\n\nJava stack trace:\n%s\n'
hail.utils.java.FatalError: UnsupportedFileSystemException: No FileSystem for scheme "s3"

Java stack trace:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
        at is.hail.io.fs.HadoopFS.fileStatus(HadoopFS.scala:164)
        at is.hail.io.fs.FS.isDir(FS.scala:175)
        at is.hail.io.fs.FS.isDir$(FS.scala:173)
        at is.hail.io.fs.HadoopFS.isDir(HadoopFS.scala:70)
        at is.hail.expr.ir.RelationalSpec$.readMetadata(AbstractMatrixTableSpec.scala:30)
        at is.hail.expr.ir.RelationalSpec$.readReferences(AbstractMatrixTableSpec.scala:68)
        at is.hail.variant.ReferenceGenome$.fromHailDataset(ReferenceGenome.scala:596)
        at is.hail.variant.ReferenceGenome.fromHailDataset(ReferenceGenome.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)



Hail version: 0.2.69-6d2bd28a8849
Error summary: UnsupportedFileSystemException: No FileSystem for scheme "s3"

My python version: Python 3.8.5.

Bests,

Carlos Ruiz

Hey Carlos!

You’ll need to install the S3 connector for Hadoop/Spark. I wrote a script to do this: After running this, you can access `s3a://` URLs through Apache Spark. · GitHub

You can run it like this:

curl https://gist.githubusercontent.com/danking/f8387f5681b03edc5babdf36e14140bc/raw/23d43a2cc673d80adcc8f2a1daee6ab252d6f667/install-s3-connector.sh | bash

Thank you for your help! Now I am able to connect to the s3 bucket but still I am not able to access the data.

The bucket I am trying to access is a public bucket. If I run:

aws s3 ls s3://pan-ukb-us-east-1/ --no-sign-request

I can successfully list the directory. However, python3 is asking me some was configuration. I created a dummy was config file, but it is not working. I am still getting an error:

>>> ht_idx = hl.read_table('s3a://pan-ukb-us-east-1/ld_release/UKBB.EUR.ldadj.variant.ht')
Traceback (most recent call last):                                                                   
  File "<stdin>", line 1, in <module>                                                                
  File "<decorator-gen-1429>", line 2, in read_table                                                                                                                                                       
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)                                                        
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/methods/impex.py", line 2457, in read_table
    for rg_config in Env.backend().load_references_from_dataset(path):                         
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 326, in load_references_from_dataset
    return json.loads(Env.hail().variant.ReferenceGenome.fromHailDataset(self.fs._jfs, path))
  File "/home/carlos/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__ 
    return_value = get_return_value(                                                                 
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 30, in deco
    raise FatalError('%s\n\nJava stack trace:\n%s\n'                                         
hail.utils.java.FatalError: AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: B6B484W5S4YYFKW6; S3 Extended Request ID: x8+lGQjxzYpHyNwUyahr2rQ9KZ
6I9No2Xqwgrl3tmfm6jfIooSk8+URJSV9e42koEu0btG0Co/g=)             
                                                  
Java stack trace:                               
java.nio.file.AccessDeniedException: s3a://pan-ukb-us-east-1/ld_release/UKBB.EUR.ldadj.variant.ht: getFileStatus on s3a://pan-ukb-us-east-1/ld_release/UKBB.EUR.ldadj.variant.ht: com.amazonaws.services.s3
.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: B6B484W5S4YYFKW6; S3 Extended Request ID: x8+lGQjxzYpHyNwUyahr2rQ9KZ6I9No2Xqwgrl3tmfm6jfI
ooSk8+URJSV9e42koEu0btG0Co/g=), S3 Extended Request ID: x8+lGQjxzYpHyNwUyahr2rQ9KZ6I9No2Xqwgrl3tmfm6jfIooSk8+URJSV9e42koEu0btG0Co/g=:403 Forbidden
     
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1640)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1271)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1249)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1246)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2183)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)
        at is.hail.io.fs.HadoopFS.fileStatus(HadoopFS.scala:164)
        at is.hail.io.fs.FS.isDir(FS.scala:175)
        at is.hail.io.fs.FS.isDir$(FS.scala:173)
        at is.hail.io.fs.HadoopFS.isDir(HadoopFS.scala:70)
        at is.hail.expr.ir.RelationalSpec$.readMetadata(AbstractMatrixTableSpec.scala:30)
        at is.hail.expr.ir.RelationalSpec$.readReferences(AbstractMatrixTableSpec.scala:68)
        at is.hail.variant.ReferenceGenome$.fromHailDataset(ReferenceGenome.scala:596)
        at is.hail.variant.ReferenceGenome.fromHailDataset(ReferenceGenome.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)




Hail version: 0.2.69-6d2bd28a8849
Error summary: AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: B6B484W5S4YYFKW6; S3 Extended Request ID: x8+lGQjxzYpHyNwUyahr2rQ9KZ6I9No2Xqwgrl3tmfm6jfIooSk8+URJSV9e42koEu0btG0Co/g=)

How can I solve this?

Bests,

Hmm. My first question is: does the same error occur if you use the s3: protocol in read_table?

No. If I use the s3: protocol, I get the same error of the previous message:

>>> ht_idx = hl.read_table('s3://pan-ukb-us-east-1/ld_release/UKBB.EUR.ldadj.variant.ht')                                                                                                                  
Initializing Hail with default parameters...                                                                                                                                                               
2021-06-23 08:59:30 WARN  Utils:69 - Your hostname, ws112610 resolves to a loopback address: 127.0.1.1; using 172.22.3.213 instead (on interface enp0s25)                                                  
2021-06-23 08:59:30 WARN  Utils:69 - Set SPARK_LOCAL_IP if you need to bind to another address                                                                                                             
2021-06-23 08:59:31 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable                                                      
Setting default log level to "WARN".                                                                                                                                                                       
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).                                                                                                               
2021-06-23 08:59:31 WARN  Hail:43 - This Hail JAR was compiled for Spark 3.1.1, running with Spark 3.1.2.                                                                                                  
  Compatibility is not guaranteed.                                                                                                                                                                         
Running on Apache Spark version 3.1.2                                                                                                                                                                      
SparkUI available at http://ws112610.cm.upf.edu:4040                                                                                                                                                       
Welcome to                                                                                                                                                                                                 
     __  __     <>__                                                                                                                                                                                       
    / /_/ /__  __/ /                                                                                                                                                                                       
   / __  / _ `/ / /                                                                                                                                                                                        
  /_/ /_/\_,_/_/_/   version 0.2.69-6d2bd28a8849                                                                                                                                                           
LOGGING: writing to /home/carlos/hail-20210623-0859-0.2.69-6d2bd28a8849.log                                                                                                                                
Traceback (most recent call last):                                                                                                                                                                         
  File "<stdin>", line 1, in <module>                                                                                                                                                                      
  File "<decorator-gen-1429>", line 2, in read_table                                                                                                                                                       
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper                                                                                                     
    return __original_func(*args_, **kwargs_)                                                                                                                                                              
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/methods/impex.py", line 2457, in read_table                                                                                                   
    for rg_config in Env.backend().load_references_from_dataset(path):                                                                                                                                     
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 326, in load_references_from_dataset                                                                          
    return json.loads(Env.hail().variant.ReferenceGenome.fromHailDataset(self.fs._jfs, path))                                                                                                              
  File "/home/carlos/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__                                                                                                      
    return_value = get_return_value(                                                                                                                                                                       
  File "/home/carlos/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 30, in deco                                                                                                    
    raise FatalError('%s\n\nJava stack trace:\n%s\n'                                                                                                                                                       
hail.utils.java.FatalError: UnsupportedFileSystemException: No FileSystem for scheme "s3"     

Bests,

1 Like

Hmm. So, that command works for me and I’m not sure why. I’m logged into an AWS account, but it shouldn’t have privileged access to that bucket.

This is what my spark-defaults.conf looks like:

(base) # cat /Users/dking/miniconda3/lib/python3.7/site-packages/pyspark/conf/spark-defaults.conf
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.google.cloud.auth.service.account.json.keyfile /Users/dking/.config/gcloud/application_default_credentials.json
spark.hadoop.fs.gs.requester.pays.mode AUTO
spark.hadoop.fs.gs.requester.pays.project.id broad-ctsa
spark.hadoop.fs.gs.project.id broad-ctsa
### START: DO NOT EDIT, MANAGED BY: install-s3-connector.sh
spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider,com.amazonaws.auth.profile.ProfileCredentialsProvider,org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider
### END: DO NOT EDIT, MANAGED BY: install-s3-connector.sh

Maybe we’re configuring the wrong spark-defaults.conf? What’s the output of these commands:

which python3
which python
echo $PYTHONPATH
find_spark_home.py
echo $SPARK_HOME

You might try removing everything from the spark.hadoop.fs.s3a.aws.credentials.provider except for the org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider.

Hi,

I have tried two things:

  • Log in AWS
  • Remove all but org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider from the spark config.

Both options seem to solve the problem and now I can access the s3 bucket with hail. I hope the issue is already solved. If I find another problem, I will let you know.

Thank you very much for your help.

Bests,

1 Like

@danking I am getting this same error

org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

This is from java spark connecting hive metastore. Is there a hive connector similar to this for java?

I have never used hive metastore, but I recommend installing the s3 connector. That should at least resolve the UnsupportedFileSystemException.