Hailctl dataproc failed to start

Dear Hail Team:
I keep have this error about the region when I try to initiate dataproc cluster for hail.

> hailctl dataproc start exome --autoscaling-policy=max-50


**OR:** (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: 'us-east1' violates constraint 'constraints/gcp.resourceLocations'.

Traceback (most recent call last):

And I have already set the region correctly, to us-east4, but it is point to us-east1, I need some help.

$ gcloud config set compute/region us-east4
Updated property [compute/region].

Your project has restrictions on where VMs can be provisioned. You should talk to whoever manages the project to learn where to be deploying clusters.

I’m not sure compute/region controls dataproc. Try --region for hailctl.

thanks, it works, --region added to the end of the hailctl commandline. Thanks very much.

Very strange using --region for hailctl, it successfully created the dataproc, but when I use this dataproc to submit my job, I have the following errors.

> Hail version: 0.2.95-513139587f57
> Error summary: GoogleJsonResponseException: 403 Forbidden
> GET https://storage.googleapis.com/storage/v1/b/fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
> {
>   "code" : 403,
>   "errors" : [ {
>     "domain" : "global",
>     "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.",
>     "reason" : "forbidden"
>   } ],
>   "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object."
> }
> ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [489069d09b2d4b179760cd8357d8eec4] failed with error:
> Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
> https://console.cloud.google.com/dataproc/jobs/489069d09b2d4b179760cd8357d8eec4?project=gsp-ccdg-f3&region=us-east4
> gcloud dataproc jobs wait '489069d09b2d4b179760cd8357d8eec4' --region 'us-east4' --project 'gsp-ccdg-f3'
> https://console.cloud.google.com/storage/browser/dataproc-staging-us-east4-124652868343-8jchtcom/google-cloud-dataproc-metainfo/2e269f3a-efe3-4bc5-b75f-6d39629fe1d4/jobs/489069d09b2d4b179760cd8357d8eec4/
> gs://dataproc-staging-us-east4-124652868343-8jchtcom/google-cloud-dataproc-metainfo/2e269f3a-efe3-4bc5-b75f-6d39629fe1d4/jobs/489069d09b2d4b179760cd8357d8eec4/driveroutput

Same issue as here: Permission Error - #2 by tpoterba

This project is actually created by myself, and owned by my account. I have just have the computing service account and my google cloud account added to the access of an AnVIL bucket that we we previously have issues with.

Looks like the compute service account doesn’t have read access to that object in gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data

I have already set the region to us-east4 and zone to us-east4-c, but still have issue in starting the issue of not be able to create the dataproc.

hailctl dataproc: Creating a cluster with workers of machine type n1-standard-8.
  Allocating 14592 MB of memory per executor (4 cores),
  with at least 8755 MB for Hail off-heap values and 5837 MB for the JVM.  Using a maximum Hail memory reservation of 3648 MB per core.
gcloud dataproc clusters create exome10 \
    --image-version=2.0.29-debian10 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648 \
    --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.95/init_notebook.py \
    --metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.95/hail-0.2.95-py3-none-any.whl|||PKGS=aiohttp==3.8.1|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|avro>=1.10,<1.12|azure-identity==1.6.0|azure-storage-blob==12.11.0|bokeh>1.3,<2.0|boto3>=1.17,<2.0|botocore>=1.20,<2.0|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|google-auth==1.27.0|google-cloud-storage==1.25.*|humanize==1.0.0|hurry.filesize==0.9|janus>=0.6,<1.1|Jinja2==3.0.3|nest_asyncio==1.5.4|numpy<2|orjson==3.6.4|pandas>=1.3.0,<1.5.0|parsimonious<0.9|plotly>=5.5.0,<5.6|PyJWT|python-json-logger==2.0.2|requests==2.25.1|scipy>1.2,<1.8|sortedcontainers==2.4.0|tabulate==0.8.9|tqdm==4.*|uvloop==0.16.0; sys_platform != 'win32' \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-secondary-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --secondary-worker-boot-disk-size=40GB \
    --worker-boot-disk-size=40GB \
    --worker-machine-type=n1-standard-8 \
    --initialization-action-timeout=20m \
    --labels=creator=hufengzhou_g_harvard_edu \
    --autoscaling-policy=max-50
Starting cluster 'exome10'...
ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Multiple validation errors:
 - 'us-east1' violates constraint 'constraints/gcp.resourceLocations'.
 - Zone 'gsp-ccdg-f3/us-east4-c' resides in unsupported region 'https://www.googleapis.com/compute/v1/projects/gsp-ccdg-f3/regions/us-east4'. Supported regions: [us-east1]
Traceback (most recent call last):
  File "/n/home05/zhouhufeng/anaconda3/bin/hailctl", line 8, in <module>
    sys.exit(main())
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/__main__.py", line 107, in main
    cli.main(args)
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/cli.py", line 123, in main
    asyncio.get_event_loop().run_until_complete(
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/start.py", line 429, in main
    gcloud.run(cmd[1:])
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
    return subprocess.check_call(["gcloud"] + command)
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'clusters', 'create', 'exome10', '--image-version=2.0.29-debian10', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.95/init_notebook.py', "--metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.95/hail-0.2.95-py3-none-any.whl|||PKGS=aiohttp==3.8.1|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|avro>=1.10,<1.12|azure-identity==1.6.0|azure-storage-blob==12.11.0|bokeh>1.3,<2.0|boto3>=1.17,<2.0|botocore>=1.20,<2.0|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|google-auth==1.27.0|google-cloud-storage==1.25.*|humanize==1.0.0|hurry.filesize==0.9|janus>=0.6,<1.1|Jinja2==3.0.3|nest_asyncio==1.5.4|numpy<2|orjson==3.6.4|pandas>=1.3.0,<1.5.0|parsimonious<0.9|plotly>=5.5.0,<5.6|PyJWT|python-json-logger==2.0.2|requests==2.25.1|scipy>1.2,<1.8|sortedcontainers==2.4.0|tabulate==0.8.9|tqdm==4.*|uvloop==0.16.0; sys_platform != 'win32'", '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=40GB', '--worker-boot-disk-size=40GB', '--worker-machine-type=n1-standard-8', '--initialization-action-timeout=20m', '--labels=creator=hufengzhou_g_harvard_edu', '--autoscaling-policy=max-50']' returned non-zero exit status 1.

Hmm. Can you paste the hailctl command line you used? Something seems wrong.

I think there’s a few places that region can be specified and something is conflicting with your zone specification. Can you share the output of this?

gcloud config get-value dataproc/region
gcloud config get-value dataproc/zone
gcloud config get-value compute/region
gcloud config get-value compute/zone

I think hailctl dataproc start ... --region us-east4 --zone us-east4-c should work.

$ gcloud config set dataproc/zone us-east4-c
ERROR: (gcloud.config.set) Section [dataproc] has no property [zone].
gcloud config get-value dataproc/zone
ERROR: (gcloud.config.get-value) Section [dataproc] has no property [zone].

gcloud config get-value compute/zone
us-east4-c
gcloud config get-value compute/region
us-east4
$ gcloud config get-value dataproc/region
us-east4

Here are my hailctl commands,

hailctl dataproc start exome10 --autoscaling-policy=max-50
hailctl dataproc submit exome10 CCDGF3MT2VCFEXOME.py

Now I can start the hail dataproc, but I still have the following permission issue, but the AnVIL administrator Kate has informed me this service account has been added to the bucket access permission.

Hail version: 0.2.95-513139587f57
Error summary: GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.",
    "reason" : "forbidden"
  } ],
  "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object."
}
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [2eb7b747cf5a4312ac2517c2550bf80e] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/2eb7b747cf5a4312ac2517c2550bf80e?project=gsp-ccdg-f3&region=us-east4
gcloud dataproc jobs wait '2eb7b747cf5a4312ac2517c2550bf80e' --region 'us-east4' --project 'gsp-ccdg-f3'

Can you ask Kate which Google Cloud Storage IAM role was granted to 124652868343-compute@developer.gserviceaccount.com? The possible GCS IAM roles are listed here.

I believe roles/storage.objectViewer is sufficient for Hail’s purposes. There are a couple misleadingly named roles that are not sufficient:

  • roles/storage.legacyBucketReader
  • roles/storage.legacyBucketOwner

Can you also confirm that the aforementioned storage account was granted access to this bucket:

fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf

Thanks very much, I am contacting Kate for help on this.
Really appreciate your great help.
We are using Hail to export the QCed VCF from VDS files, but unexpected encountered many technical issues.

1 Like

Thanks very much for pointing out the issue.
Kate has forwarded the issue to Terra Support to help us troubleshoot. I will keep you posted if the issue resolved.

1 Like