Hail cluster creation error

i am getting the following error while making cluster on google cloud after running the following command…

hailctl dataproc --beta start hailpy --vep GRCh37 --optional-components=ANACONDA,JUPYTER --enable-component-gateway --bucket bucketname–project projectname–region us-central1


Your active configuration is: [cloudshell-28616]
gcloud beta dataproc clusters create \
    hailpy \
    --image-version=1.4-debian9 \
    --properties=spark:spark.driver.maxResultSize=0,spark:spark.task.maxFailures=20,spark:spark.kryoserializer.buffer.max=1g,spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M,hdfs:dfs.replication=1,dataproc:dataproc.logging.stackdriver.enable=false,dataproc:dataproc.monitoring.stackdriver.enable=false,spark:spark.driver.memory=41g \
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.27/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.27/vep-GRCh37.sh \
    --metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.27/hail-0.2.27-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3 \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-preemptible-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
--preemptible-worker-boot-disk-size=200GB \
    --worker-boot-disk-size=200GB \
    --worker-machine-type=n1-highmem-8 \
    --zone=us-central1-b \
    --initialization-action-timeout=20m \
    --project=..... \
    --bucket=..\
    --labels=creator=...\
    --optional-components=ANACONDA,JUPYTER \
    --enable-component-gateway \
    --region \
    us-central1
Starting cluster 'hailpy'...
Waiting on operation [projects/.../regions/us-central1/operations/fb08d024-7087-3e83-9101-3640e376aa9b].
WARNING: For PD-Standard without local SSDs, we strongly recommend provisioning 1TB or larger to ensure consistently high I/O performance. See https://cloud.google.com/compute/docs/disks/performance for information on disk I/O performance.
Waiting for cluster creation operation...done.
**ERROR**: (gcloud.beta.dataproc.clusters.create) Operation [projects/cncdanalyses/regions/us-central1/operations/fb08d024-7087-3e83-9101-3640e376aa9b] failed: Multiple Errors:
 - Timeout waiting for instance hailpy-m to report in.
 - Timeout waiting for instance hailpy-w-0 to report in.
 - Timeout waiting for instance hailpy-w-1 to report in..
Traceback (most recent call last):
  File "/home/zahidhaseeb46/env/bin/hailctl", line 8, in <module>
    sys.exit(main())
File "/home/.../env/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 94, in main
    cli.main(args)
  File "/home/.../env/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 107, in main
    jmp[args.module].main(args, pass_through_args)
  File "/home/.../env/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 200, in main
    sp.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'hailpy', '--image-version=1.4-debian9', '--properties=spark:spark.driver.maxResultSize=0,spark:spark.task.maxFailures=20,spark:spark.kryoserializer.buffer.max=1g,spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M,hdfs:dfs.replication=1,dataproc:dataproc.logging.stackdriver.enable=false,dataproc:dataproc.monitoring.stackdriver.enable=false,spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.27/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.27/vep-GRCh37.sh', '--metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.27/hail-0.2.27-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-preemptible-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--preemptible-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--zone=us-central1-b', '--initialization-action-timeout=20m', '--project=...', '--bucket=...', '--labels=creator=...', '--optional-components=ANACONDA,JUPYTER', '--enable-component-gateway', '--region', 'us-central1']' returned non-zero exit status 1.

I think this is a google cloud failure – can you try again today?

ok i will try it again but i have been trying from 3 to 4 days but same result…

Again tried it but failed with same error…

hailctl dataproc --beta start hailpy --vep GRCh37 --optional-components=ANACONDA,JUPYTER --enable-component-gateway --bucket … --project … --region us-central1

ERROR: (gcloud.beta.dataproc.clusters.create) Operation [projects/…/regions/us-central1/operations/f6631ead-39b6-34cd-82a6-dbb802adff1e] failed: Multiple Errors:

  • Timeout waiting for instance hailpy-m to report in.
  • Timeout waiting for instance hailpy-w-0 to report in.
  • Timeout waiting for instance hailpy-w-1 to report in…
    Traceback (most recent call last):
    File “/home/…/env/bin/hailctl”, line 8, in
    sys.exit(main())
    File “/home/…/env/lib/python3.7/site-packages/hailtop/hailctl/main.py”, line 94, in main
    cli.main(args)
    File “/home/…/env/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py”, line 107, in main
    jmp[args.module].main(args, pass_through_args)
    File “/home/…/env/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py”, line 200, in main
    sp.check_call(cmd)
    File “/usr/local/lib/python3.7/subprocess.py”, line 347, in check_call
    raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command ‘[‘gcloud’, ‘beta’, ‘dataproc’, ‘clusters’, ‘create’, ‘hailpy’, ‘–image-version=1.4-debian9’, ‘–properties=spark:spark.driver.maxResultSize=0,spark:spark.task.maxFailures=20,spark:spark.kryoserializer.buffer.max=1g,spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M,hdfs:dfs.replication=1,dataproc:dataproc.logging.stackdriver.enable=false,dataproc:dataproc.monitoring.stackdriver.enable=false,spark:spark.driver.memory=41g’, ‘–initialization-actions=gs://hail-common/hailctl/dataproc/0.2.27/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.27/vep-GRCh37.sh’, ‘–metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.27/hail-0.2.27-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3’, ‘–master-machine-type=n1-highmem-8’, ‘–master-boot-disk-size=100GB’, ‘–num-master-local-ssds=0’,’–num-preemptible-workers=0’, ‘–num-worker-local-ssds=0’, ‘–num-workers=2’, ‘–preemptible-worker-boot-disk-size=200GB’, ‘–worker-boot-disk-size=200GB’, ‘–worker-machine-type=n1-highmem-8’, ‘–zone=us-central1-b’, ‘–initialization-action-timeout=20m’,’–project=…’, ‘–bucket=…’, ‘–labels=creator=…_gmail_com’, ‘–optional-components=ANACONDA,JUPYTER’, ‘–enable-component-gateway’, ‘–region’, ‘us-central1’]’ returned non-zero exit status 1.

I created a normal cluster without any other parameters as hailctl dataproc start develop and it was successful… but creating cluster using this
hailctl dataproc --beta start hailpy --vep GRCh37 --optional components=ANACONDA,JUPYTER --enable-component-gateway --bucket … --project … --region us-central1
gives error…

I think I mostly understand what’s going on. We set an initialization action timeout of 20m. We know VEP takes ~10-15m to install, and I think that with the --optional components=ANACONDA,JUPYTER that pushes the initialization over the 20m limit.

Why are you installing anaconda and jupyter using gcloud dataproc components? We install miniconda and jupyter notebooks in our initialization scripts (you can connect to a jupyter notebook instance using hailctl dataproc connect develop notebook once it’s created).

hailctl dataproc start haseeb-hail --vep GRCh37

I have now used this command but still getting the same error

Does it work without VEP? I know you need VEP, but might be useful to know for debugging purposes.

@Haseeb1 can you attach to this thread the Dataproc initialization logs?

No, it is not working.

sorry @danking i cant workout the logging files