Cannot start hailctl dataproc cluster - determine Google Cloud SDK version?

Hi all,

I read all related topics here and online but could not find an answered problem similar to mine so here goes - I want to read gnomAD data using hailctl dataproc clusters, I’ve read relevant how-tos etc. However when I try to start a cluster I get:

(base) C:\Users\eliza>hailctl dataproc start genentutorial
Warning: unable to determine Google Cloud SDK version
Traceback (most recent call last):
File “c:\users\eliza\anaconda3\lib\runpy.py”, line 194, in _run_module_as_main
return run_code(code, main_globals, None,
File “c:\users\eliza\anaconda3\lib\runpy.py”, line 87, in run_code
exec(code, run_globals)
File "C:\Users\eliza\anaconda3\Scripts\hailctl.exe_main
.py", line 7, in
File "c:\users\eliza\anaconda3\lib\site-packages\hailtop\hailctl_main
.py", line 107, in main
cli.main(args)
File “c:\users\eliza\anaconda3\lib\site-packages\hailtop\hailctl\dataproc\cli.py”, line 123, in main
asyncio.get_event_loop().run_until_complete(
File “c:\users\eliza\anaconda3\lib\asyncio\base_events.py”, line 616, in run_until_complete
return future.result()
File “c:\users\eliza\anaconda3\lib\site-packages\hailtop\hailctl\dataproc\start.py”, line 287, in main
project_region = gcloud.get_config(“dataproc/region”)
File “c:\users\eliza\anaconda3\lib\site-packages\hailtop\hailctl\dataproc\gcloud.py”, line 15, in get_config
return subprocess.check_output([“gcloud”, “config”, “get-value”, setting], stderr=subprocess.DEVNULL).decode().strip()
File “c:\users\eliza\anaconda3\lib\subprocess.py”, line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File “c:\users\eliza\anaconda3\lib\subprocess.py”, line 489, in run
with Popen(*popenargs, **kwargs) as process:
File “c:\users\eliza\anaconda3\lib\subprocess.py”, line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File “c:\users\eliza\anaconda3\lib\subprocess.py”, line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

I have a feeling it’s related to the fact it’s ‘unable to determine Google Cloud SDK version’,but I am unsure how to mitigate this.
I am able to start gcloud dataproc clusters and access and manipulate them, set all the relevant env variables and paths etc. It just seems to be a hail problem

Does anyone have experience with this?

Thanks in advance! :slight_smile: