I am unable to start a new Hail dataproc cluster

I am unable to start a new Hail dataproc cluster

I have installed the required packages in my Linux (Ubuntu) conda environment (which I named hailtest). I did everything in the Linux Terminal with my conda environment activated.

# packages in environment at /home/millie/anaconda3/envs/hailtest:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiohttp                   3.8.3            py37h5eee18b_0  
aiohttp-session           2.7.0                      py_0    conda-forge
aiosignal                 1.2.0              pyhd3eb1b0_0  
async-timeout             4.0.2            py37h06a4308_0  
asyncinit                 0.2.4              pyhd8ed1ab_0    conda-forge
asynctest                 0.13.0                     py_0  
attrs                     22.1.0           py37h06a4308_0  
blas                      1.0                    openblas  
blinker                   1.4              py37h06a4308_0  
bokeh                     1.2.0                    py37_0  
bottleneck                1.3.5            py37h7deecbd_0  
brotlipy                  0.7.0           py37h27cfd23_1003  
c-ares                    1.19.1               h5eee18b_0  
ca-certificates           2024.12.31           h06a4308_0  
cachetools                4.2.2              pyhd3eb1b0_0  
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py37h5eee18b_3  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
click                     8.0.4            py37h06a4308_0  
cryptography              38.0.2           py37h5994e8b_1    conda-forge
decorator                 4.4.2              pyhd3eb1b0_0  
deprecated                1.2.13           py37h06a4308_0  
dill                      0.3.6            py37h06a4308_0  
fftw                      3.3.9                h5eee18b_2  
flit-core                 3.6.0              pyhd3eb1b0_0  
freetype                  2.11.0               h70c0345_0  
frozenlist                1.3.3            py37h5eee18b_0  
fsspec                    2023.1.0           pyhd8ed1ab_0    conda-forge
gcsfs                     2023.1.0           pyhd8ed1ab_0    conda-forge
giflib                    5.2.2                h5eee18b_0  
google-api-core           2.10.1           py37h06a4308_0  
google-auth               2.6.0              pyhd3eb1b0_0  
google-auth-oauthlib      0.5.2            py37h06a4308_0  
google-cloud-core         2.3.2            py37h06a4308_0  
google-cloud-sdk          406.0.0          py37h89c1867_0    conda-forge
google-cloud-storage      2.6.0            py37h06a4308_0  
google-crc32c             1.5.0            py37h5eee18b_0  
google-resumable-media    2.4.0            py37h06a4308_0  
googleapis-common-protos  1.56.4           py37h06a4308_0  
grpcio                    1.46.1           py37h0327239_0    conda-forge
hail                      0.2.61           py37h9a982cc_1    bioconda
humanize                  3.10.0             pyhd3eb1b0_0  
hurry.filesize            0.9                pyh8c360ce_0    conda-forge
idna                      3.4              py37h06a4308_0  
importlib-metadata        4.11.3           py37h06a4308_0  
jinja2                    3.1.2            py37h06a4308_0  
jpeg                      9e                   h5eee18b_3  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.40                 h12ee557_0  
libcrc32c                 1.1.2                h6a678d5_0  
libffi                    3.4.4                h6a678d5_1  
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   14.2.0               h77fa898_1    conda-forge
libnsl                    2.0.0                h5eee18b_0  
libopenblas               0.3.21               h043d6bf_0  
libpng                    1.6.37               hbc83047_0  
libprotobuf               3.20.1               h6239696_0    conda-forge
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.2.0                h85742a9_0  
libwebp                   1.2.4                h11a3e52_1  
libwebp-base              1.2.4                h5eee18b_1  
libzlib                   1.2.11            h166bdaf_1014    conda-forge
lz4-c                     1.9.4                h6a678d5_1  
markupsafe                2.1.1            py37h7f8727e_0  
multidict                 6.0.2            py37h5eee18b_0  
ncurses                   6.4                  h6a678d5_0  
nest-asyncio              1.5.6            py37h06a4308_0  
numexpr                   2.8.4            py37hd2a5715_0  
numpy                     1.21.5           py37hf838250_3  
numpy-base                1.21.5           py37h1e6e340_3  
oauthlib                  3.2.1            py37h06a4308_0  
openjdk                   8.0.412              hd590300_1    conda-forge
openssl                   3.4.0                h7b32b05_1    conda-forge
packaging                 22.0             py37h06a4308_0  
pandas                    1.3.5            py37h8c16a72_0  
parsimonious              0.10.0             pyhd8ed1ab_0    conda-forge
pillow                    9.0.1            py37h22f2fdc_0  
pip                       22.3.1           py37h06a4308_0  
protobuf                  3.20.1           py37h295c915_0  
py4j                      0.10.7                   py37_0  
pyasn1                    0.4.8              pyhd3eb1b0_0  
pyasn1-modules            0.2.8                      py_0  
pycparser                 2.21               pyhd3eb1b0_0  
pyjwt                     2.4.0            py37h06a4308_0  
pyopenssl                 23.0.0           py37h06a4308_0  
pysocks                   1.7.1                    py37_1  
pyspark                   2.4.1                      py_0  
python                    3.7.12          hf930737_100_cpython    conda-forge
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-json-logger        0.1.11             pyhd3eb1b0_0  
python_abi                3.7                     4_cp37m    conda-forge
pytz                      2022.7           py37h06a4308_0  
pyyaml                    6.0              py37h5eee18b_1  
readline                  8.2                  h5eee18b_0  
regex                     2022.7.9         py37h5eee18b_0  
requests                  2.28.1           py37h06a4308_0  
requests-oauthlib         1.3.0                      py_0  
rsa                       4.7.2              pyhd3eb1b0_1  
scipy                     1.7.3            py37hf838250_2  
setuptools                65.6.3           py37h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.38.2               hc218d9a_0  
tabulate                  0.8.3                    py37_0  
tk                        8.6.11               h1ccaba5_0  
tornado                   6.2              py37h5eee18b_0  
tqdm                      4.42.1                     py_0  
typing-extensions         4.4.0            py37h06a4308_0  
typing_extensions         4.4.0            py37h06a4308_0  
urllib3                   1.26.14          py37h06a4308_0  
wheel                     0.38.4           py37h06a4308_0  
wrapt                     1.14.1           py37h5eee18b_0  
xz                        5.4.6                h5eee18b_1  
yaml                      0.2.5                h7b6447c_0  
yarl                      1.8.1            py37h5eee18b_0  
zipp                      3.11.0           py37h06a4308_0  
zlib                      1.2.11            h166bdaf_1014    conda-forge
zstd                      1.4.9                haebb681_0  

I first tried to start a hail dataproc cluster named ‘hailtest’

(hailtest) millie@millie-System:~$ hailctl dataproc start hailtest
Traceback (most recent call last):
  File "/home/millie/anaconda3/envs/hailtest/bin/hailctl", line 10, in <module>
    sys.exit(main())
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main
    cli.main(args)
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
    jmp[args.module].main(args, pass_through_args)
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 274, in main
    raise RuntimeError("Could not determine dataproc region. Use --region argument to hailctl, or use `gcloud config set dataproc/region <my-region>` to set a default.")
RuntimeError: Could not determine dataproc region. Use --region argument to hailctl, or use `gcloud config set dataproc/region <my-region>` to set a default.

They mentioned that region is not specified, hence I searched for google dataproc regions and decided on asia-southeast1 since I am in Singapore.

(hailtest) millie@millie-System:~$ hailctl dataproc start hailtest --region='asia-southeast1'

gcloud dataproc clusters create hailtest \
    --image-version=1.4-debian9 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
    --initialization-actions=gs://hail-common/hailctl/dataproc/root-dev/0.2.61-3c86d3ba497a/init_notebook.py \
    --metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/root-dev/0.2.61-3c86d3ba497a/hail-0.2.61-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-secondary-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --secondary-worker-boot-disk-size=40GB \
    --worker-boot-disk-size=40GB \
    --worker-machine-type=n1-standard-8 \
    --region=asia-southeast1 \
    --initialization-action-timeout=20m
Starting cluster 'hailtest'...
ERROR: (gcloud.dataproc.clusters.create) Error parsing [cluster].
The [cluster] resource is not properly specified.
Failed to find attribute [project]. The attribute can be set in the following ways: 
- provide the argument `--project` on the command line
- set the property `core/project`
Traceback (most recent call last):
  File "/home/millie/anaconda3/envs/hailtest/bin/hailctl", line 10, in <module>
    sys.exit(main())
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main
    cli.main(args)
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
    jmp[args.module].main(args, pass_through_args)
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main
    gcloud.run(cmd[1:])
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
    return subprocess.check_call(["gcloud"] + command)
  File "/home/millie/anaconda3/envs/hailtest/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'dataproc', 'clusters', 'create', 'hailtest', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/root-dev/0.2.61-3c86d3ba497a/init_notebook.py', '--metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/root-dev/0.2.61-3c86d3ba497a/hail-0.2.61-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=40GB', '--worker-boot-disk-size=40GB', '--worker-machine-type=n1-standard-8', '--region=asia-southeast1', '--initialization-action-timeout=20m']' returned non-zero exit status 1.

This error comes on and I am still unable to create a new hail dataproc cluster.

My Python is version 3.7.12

My java version is:
openjdk version “1.8.0_412”
OpenJDK Runtime Environment (Zulu 8.78.0.19-CA-linux64) (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Zulu 8.78.0.19-CA-linux64) (build 25.412-b08, mixed mode)

What do you think is the problem?

Thanks
Millie

Hi @milamkk,
The second error says

Failed to find attribute [project]. The attribute can be set in the following ways: 
- provide the argument `--project` on the command line
- set the property `core/project`

Did you try providing the name of your google project via the --project argument on the command line?