Permission Error

I seems to have access permission error with the following error message.
But I do have access to the google bucket and the project,

hufengzhou$ gsutil ls fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exome_203k.vds/
can show the content

hailctl dataproc submit export1 CCDGF3MT2VCFEXOME.py

Submitting to cluster ‘export1’…

gcloud command:

{

“code” : 403,

“errors” : [ {

"domain" : "global",

"message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.",

"reason" : "forbidden"

} ],

“message” : “124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.”

}

Java stack trace:

java.io.IOException: Error accessing gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exome_203k.vds/reference_data

at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:2140)

File “/Users/hufengzhou/opt/anaconda3/lib/python3.9/asyncio/base_events.py”, line 647, in run_until_complete

return future.result()

File “/Users/hufengzhou/opt/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/submit.py”, line 88, in main

gcloud.run(cmd)

File “/Users/hufengzhou/opt/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/gcloud.py”, line 9, in run

return subprocess.check_call(["gcloud"] + command)

File “/Users/hufengzhou/opt/anaconda3/lib/python3.9/subprocess.py”, line 373, in check_call

raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command ‘[‘gcloud’, ‘dataproc’, ‘jobs’, ‘submit’, ‘pyspark’, ‘CCDGF3MT2VCFEXOME.py’, ‘–cluster=export1’, ‘–files=’, ‘–py-files=/var/folders/kt/5d9qryss0bg9s43mmwyrm2_80000gn/T/pyscripts_1d5j0xic.zip’, ‘–properties=’]’ returned non-zero exit status 1.

Permissions on the cloud are unfortunately a lot more complicated than institutional HPC clusters. The core issue here is that Dataproc clusters aren’t using permissions from your Google account, but rather from the compute engine service account for your billing project. That service account is listed in the error, XXXX-compute@developer.gserviceaccount.com. There are two solutions here:

  1. Ask whoever owns the project where the data lives to grant storage object administrator privileges to your compute service account above. This is the preferred solution since it incurs no additional cost.

  2. Use your Google account through gsutil to copy the data into a bucket owned by your project (or one where you can grant this service account privileges). This is a worse solution because it requires paying for two copies of the data.