Assuming this is possible, of course.
I’m trying to build a cluster, where I’m limited to 8 cpus (free trial).
I don’t want any --num-preemptible-workers
and I understand that the minimal number of workers is 2. (BTW, is master = workers?)
So why when I try something like:
hailctl dataproc start vep-hail --vep GRCh37 --region europe-west2 --master-machine-type=n1-highmem-4
I got this:
Pulling VEP data from bucket in uk.
gcloud dataproc clusters create vep-hail \
--image-version=1.4-debian9 \
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=20g \
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.39/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.39/vep-GRCh37.sh \
--metadata=^|||^VEP_REPLICATE=uk|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.39/hail-0.2.39-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1 \
--master-machine-type=n1-highmem-4 \
--master-boot-disk-size=100GB \
--num-master-local-ssds=0 \
--num-preemptible-workers=0 \
--num-worker-local-ssds=0 \
--num-workers=2 \
--preemptible-worker-boot-disk-size=200GB \
--worker-boot-disk-size=200GB \
--worker-machine-type=n1-highmem-8 \
--region=europe-west2 \
--initialization-action-timeout=20m \
--labels=creator=alanwilter_gmail_com
Starting cluster 'vep-hail'...
WARNING: The `--num-preemptible-workers` flag is deprecated. Use the `--num-secondary-workers` flag instead.
WARNING: The `--preemptible-worker-boot-disk-size` flag is deprecated. Use the `--secondary-worker-boot-disk-size` flag instead.
ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Multiple validation errors:
- Insufficient 'CPUS' quota. Requested 20.0, available 8.0.
- Insufficient 'CPUS_ALL_REGIONS' quota. Requested 20.0, available 12.0.
- This request exceeds CPU quota. Some things to try: request fewer workers (a minimum of 2 is required), use smaller master and/or worker machine types (such as n1-standard-2).
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 108, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 346, in main
sp.check_call(cmd)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'clusters', 'create', 'vep-hail', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=20g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.39/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.39/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=uk|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.39/hail-0.2.39-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1', '--master-machine-type=n1-highmem-4', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-preemptible-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--preemptible-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--region=europe-west2', '--initialization-action-timeout=20m', '--labels=creator=alanwilter_gmail_com']' returned non-zero exit status 1.
According to GCP doc n1-highmem-4
or n1-standard-4
have 4 vCPUs, so I should have a cluster with 2 workers with 4 vCPUs each, hence total 8 vCPUs, but hailctl dataproc ...
command is asking for 20!
Any help here please? Many thanks in advance.