Hailctl dataproc failed to start

zhouhufeng · September 22, 2022, 2:39pm

Dear Hail Team:
I keep have this error about the region when I try to initiate dataproc cluster for hail.

> hailctl dataproc start exome --autoscaling-policy=max-50


**OR:** (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: 'us-east1' violates constraint 'constraints/gcp.resourceLocations'.

Traceback (most recent call last):

And I have already set the region correctly, to us-east4, but it is point to us-east1, I need some help.

$ gcloud config set compute/region us-east4
Updated property [compute/region].

tpoterba · September 22, 2022, 2:54pm

Your project has restrictions on where VMs can be provisioned. You should talk to whoever manages the project to learn where to be deploying clusters.

danking · September 22, 2022, 2:55pm

I’m not sure compute/region controls dataproc. Try --region for hailctl.

zhouhufeng · September 22, 2022, 6:01pm

thanks, it works, --region added to the end of the hailctl commandline. Thanks very much.

zhouhufeng · September 22, 2022, 7:23pm

Very strange using --region for hailctl, it successfully created the dataproc, but when I use this dataproc to submit my job, I have the following errors.

> Hail version: 0.2.95-513139587f57
> Error summary: GoogleJsonResponseException: 403 Forbidden
> GET https://storage.googleapis.com/storage/v1/b/fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
> {
>   "code" : 403,
>   "errors" : [ {
>     "domain" : "global",
>     "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.",
>     "reason" : "forbidden"
>   } ],
>   "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object."
> }
> ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [489069d09b2d4b179760cd8357d8eec4] failed with error:
> Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
> https://console.cloud.google.com/dataproc/jobs/489069d09b2d4b179760cd8357d8eec4?project=gsp-ccdg-f3&region=us-east4
> gcloud dataproc jobs wait '489069d09b2d4b179760cd8357d8eec4' --region 'us-east4' --project 'gsp-ccdg-f3'
> https://console.cloud.google.com/storage/browser/dataproc-staging-us-east4-124652868343-8jchtcom/google-cloud-dataproc-metainfo/2e269f3a-efe3-4bc5-b75f-6d39629fe1d4/jobs/489069d09b2d4b179760cd8357d8eec4/
> gs://dataproc-staging-us-east4-124652868343-8jchtcom/google-cloud-dataproc-metainfo/2e269f3a-efe3-4bc5-b75f-6d39629fe1d4/jobs/489069d09b2d4b179760cd8357d8eec4/driveroutput

tpoterba · September 22, 2022, 7:26pm

Same issue as here: Permission Error - #2 by tpoterba

zhouhufeng · September 22, 2022, 7:28pm

This project is actually created by myself, and owned by my account. I have just have the computing service account and my google cloud account added to the access of an AnVIL bucket that we we previously have issues with.

tpoterba · September 22, 2022, 7:32pm

Looks like the compute service account doesn’t have read access to that object in gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data

zhouhufeng · September 22, 2022, 7:32pm

I have already set the region to us-east4 and zone to us-east4-c, but still have issue in starting the issue of not be able to create the dataproc.

hailctl dataproc: Creating a cluster with workers of machine type n1-standard-8.
  Allocating 14592 MB of memory per executor (4 cores),
  with at least 8755 MB for Hail off-heap values and 5837 MB for the JVM.  Using a maximum Hail memory reservation of 3648 MB per core.
gcloud dataproc clusters create exome10 \
    --image-version=2.0.29-debian10 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648 \
    --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.95/init_notebook.py \
    --metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.95/hail-0.2.95-py3-none-any.whl|||PKGS=aiohttp==3.8.1|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|avro>=1.10,<1.12|azure-identity==1.6.0|azure-storage-blob==12.11.0|bokeh>1.3,<2.0|boto3>=1.17,<2.0|botocore>=1.20,<2.0|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|google-auth==1.27.0|google-cloud-storage==1.25.*|humanize==1.0.0|hurry.filesize==0.9|janus>=0.6,<1.1|Jinja2==3.0.3|nest_asyncio==1.5.4|numpy<2|orjson==3.6.4|pandas>=1.3.0,<1.5.0|parsimonious<0.9|plotly>=5.5.0,<5.6|PyJWT|python-json-logger==2.0.2|requests==2.25.1|scipy>1.2,<1.8|sortedcontainers==2.4.0|tabulate==0.8.9|tqdm==4.*|uvloop==0.16.0; sys_platform != 'win32' \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-secondary-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --secondary-worker-boot-disk-size=40GB \
    --worker-boot-disk-size=40GB \
    --worker-machine-type=n1-standard-8 \
    --initialization-action-timeout=20m \
    --labels=creator=hufengzhou_g_harvard_edu \
    --autoscaling-policy=max-50
Starting cluster 'exome10'...
ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Multiple validation errors:
 - 'us-east1' violates constraint 'constraints/gcp.resourceLocations'.
 - Zone 'gsp-ccdg-f3/us-east4-c' resides in unsupported region 'https://www.googleapis.com/compute/v1/projects/gsp-ccdg-f3/regions/us-east4'. Supported regions: [us-east1]
Traceback (most recent call last):
  File "/n/home05/zhouhufeng/anaconda3/bin/hailctl", line 8, in <module>
    sys.exit(main())
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/__main__.py", line 107, in main
    cli.main(args)
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/cli.py", line 123, in main
    asyncio.get_event_loop().run_until_complete(
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/start.py", line 429, in main
    gcloud.run(cmd[1:])
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
    return subprocess.check_call(["gcloud"] + command)
  File "/n/home05/zhouhufeng/anaconda3/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'clusters', 'create', 'exome10', '--image-version=2.0.29-debian10', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.95/init_notebook.py', "--metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.95/hail-0.2.95-py3-none-any.whl|||PKGS=aiohttp==3.8.1|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|avro>=1.10,<1.12|azure-identity==1.6.0|azure-storage-blob==12.11.0|bokeh>1.3,<2.0|boto3>=1.17,<2.0|botocore>=1.20,<2.0|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|google-auth==1.27.0|google-cloud-storage==1.25.*|humanize==1.0.0|hurry.filesize==0.9|janus>=0.6,<1.1|Jinja2==3.0.3|nest_asyncio==1.5.4|numpy<2|orjson==3.6.4|pandas>=1.3.0,<1.5.0|parsimonious<0.9|plotly>=5.5.0,<5.6|PyJWT|python-json-logger==2.0.2|requests==2.25.1|scipy>1.2,<1.8|sortedcontainers==2.4.0|tabulate==0.8.9|tqdm==4.*|uvloop==0.16.0; sys_platform != 'win32'", '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=40GB', '--worker-boot-disk-size=40GB', '--worker-machine-type=n1-standard-8', '--initialization-action-timeout=20m', '--labels=creator=hufengzhou_g_harvard_edu', '--autoscaling-policy=max-50']' returned non-zero exit status 1.

danking · September 22, 2022, 7:57pm

Hmm. Can you paste the hailctl command line you used? Something seems wrong.

I think there’s a few places that region can be specified and something is conflicting with your zone specification. Can you share the output of this?

gcloud config get-value dataproc/region
gcloud config get-value dataproc/zone
gcloud config get-value compute/region
gcloud config get-value compute/zone

I think hailctl dataproc start ... --region us-east4 --zone us-east4-c should work.

zhouhufeng · September 22, 2022, 8:18pm

$ gcloud config set dataproc/zone us-east4-c
ERROR: (gcloud.config.set) Section [dataproc] has no property [zone].
gcloud config get-value dataproc/zone
ERROR: (gcloud.config.get-value) Section [dataproc] has no property [zone].

gcloud config get-value compute/zone
us-east4-c
gcloud config get-value compute/region
us-east4
$ gcloud config get-value dataproc/region
us-east4

Here are my hailctl commands,

hailctl dataproc start exome10 --autoscaling-policy=max-50
hailctl dataproc submit exome10 CCDGF3MT2VCFEXOME.py

zhouhufeng · September 22, 2022, 8:32pm

Now I can start the hail dataproc, but I still have the following permission issue, but the AnVIL administrator Kate has informed me this service account has been added to the bucket access permission.

Hail version: 0.2.95-513139587f57
Error summary: GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/o/ccdg_exome_203k.vds%2Freference_data?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.",
    "reason" : "forbidden"
  } ],
  "message" : "124652868343-compute@developer.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object."
}
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [2eb7b747cf5a4312ac2517c2550bf80e] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/2eb7b747cf5a4312ac2517c2550bf80e?project=gsp-ccdg-f3&region=us-east4
gcloud dataproc jobs wait '2eb7b747cf5a4312ac2517c2550bf80e' --region 'us-east4' --project 'gsp-ccdg-f3'

danking · September 22, 2022, 8:38pm

Can you ask Kate which Google Cloud Storage IAM role was granted to 124652868343-compute@developer.gserviceaccount.com? The possible GCS IAM roles are listed here.

I believe roles/storage.objectViewer is sufficient for Hail’s purposes. There are a couple misleadingly named roles that are not sufficient:

roles/storage.legacyBucketReader
roles/storage.legacyBucketOwner

Can you also confirm that the aforementioned storage account was granted access to this bucket:

fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf

zhouhufeng · September 22, 2022, 8:42pm

Thanks very much, I am contacting Kate for help on this.
Really appreciate your great help.
We are using Hail to export the QCed VCF from VDS files, but unexpected encountered many technical issues.

zhouhufeng · September 23, 2022, 9:27pm

Thanks very much for pointing out the issue.
Kate has forwarded the issue to Terra Support to help us troubleshoot. I will keep you posted if the issue resolved.

Topic		Replies	Views
Hail cluster creation error Hail Query & hailctl	18	1954	May 14, 2020
Using Hail on the Google Cloud Platform Help [0.1]	18	14051	September 14, 2017
I got an error trying to start a new Hail cluster Hail Query & hailctl	2	1326	February 3, 2020
I am unable to start a new Hail dataproc cluster Hail Query & hailctl	1	38	January 21, 2025
Error 403 Forbidden when I try to load the hail experimental dataset '1000_Genomes_autosomes' Hail Query & hailctl	15	943	October 29, 2020

Hailctl dataproc failed to start

Related topics