S3 access from dataproc

Hi,

I’m trying to access s3 data from Hail on the Dataproc cluster, but facing this issue.
Logs
Py4JJavaError: An error occurred while calling o63.csv.
: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on bt-transient-bucket-rbnc: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: The requested metadata is not found at http://ip/latest/meta-data/iam/security-credentials/: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: The requested metadata is not found at http://ip/4/latest/meta-data/iam/security-credentials/

can you please provied documentation or any details on how to access s3 data from Hail dataproc cluster?

Thanks.

You almost certainly do not want to do this. You’ll pay by-the-byte to read data from Amazon.

If you really want to do this, the answer doesn’t have anything to do with Hail. You need to figure out how to make Dataproc read from S3. We’ve never done this, but you might start with this Stack Overflow post.

I did provide the access key and secret key to the Hadoop configuration and I’m able to access the s3 files from the normal Dataproc cluster but, if I provide the same settings to the hail Dataproc cluster, in this case, I’m facing the issue.

Thanks.

Can you share the code you used to configure the Dataproc cluster?