Permission Error

Permissions on the cloud are unfortunately a lot more complicated than institutional HPC clusters. The core issue here is that Dataproc clusters aren’t using permissions from your Google account, but rather from the compute engine service account for your billing project. That service account is listed in the error, XXXX-compute@developer.gserviceaccount.com. There are two solutions here:

  1. Ask whoever owns the project where the data lives to grant storage object administrator privileges to your compute service account above. This is the preferred solution since it incurs no additional cost.

  2. Use your Google account through gsutil to copy the data into a bucket owned by your project (or one where you can grant this service account privileges). This is a worse solution because it requires paying for two copies of the data.