I’m sorry you are having trouble following the “Using Hail with Jupyter Notebooks on Google Cloud” post! That post is really out of date. My colleague Tim has added a warning to that post.
I manage to install Hail with dataproc on GC but I can not also install a jupyter notebook that works.
I explain in detail my attempts:
A) Install Hail with gs: //hail-common/hail-init.sh: it works, I can submit jobs to the cluster so everything is good.
B) Install Hail with gs: //hail-common/init_notebook.py
The cluster is created, I can submit python job but I don’t manage to access the jupyter notebook (with the port 8123).
C) Install Hail with gs: //hail-common/cloudtools/init_notebook1.py, nothing works. The cluster is not created, here is the error :
ERROR: (gcloud.dataproc.clusters.create) Operation [] failed: initialization action failed. Failed action ‘gs: //hail-common/cloudtools/init_notebook1.py’, see output in: gs: // dataprocm / dataproc-initialization-script-0_output.
So I have several questions:
I believe there are several versions of this script gs: //hail-common/cloudtools/init_notebook1.py, init_notebook 2, 3 …? Why? Which one to take?
I find it surprising that attempt B works; Maybe everything is well installed but I can not access it? If yes, why write a new script ?
It is really important for me to have access to a jupyter notebook with Hail. Help me please …
the correct usage of the new-style init_notebook scripts requires a conda installation script beforehand. This is an example gcloud invocation from cloudtools:
I have access to the cluster on port 8088 but not to the notebook.
And also :
gcloud dataproc jobs submit pyspark --cluster=t1 --project=avl-hail-ines gs://ines-python/start.py
where the file start.py is from cloudtools but with the change that @danking said to me (r’/path/to/chrome.exe’)
If I connect to the master node and type jupyter notebook. I have this error:
[C 13:14:22.736 NotebookApp] Bad config encountered during initialization:
[C 13:14:22.736 NotebookApp] The ‘contents_manager_class’ trait of <notebook.notebookapp.NotebookApp object at 0x7fa44cf39978> instance must be a type, but ‘jgscm.GoogleStorageContentManager’ could not be imported
I’m sorry you encountered this issue! There was a recent breaking change to Google Cloud’s python library that we were unaware of. We are releasing a fix for this now. To avoid this situation in the future, we have specified a specific version of google cloud’s python library and will upgrade when after we have verified the new version works.
This is the correct way to set up an SSH tunnel. However, instead of http://t1-m:8123 can you try http://localhost:8123? Since you are tunneling to t1-m (where the jupyter notebook server is running) using the socks proxy, the jupyter notebook server can be accessed through localhost.
Hi, I had similar issues setting up Hail on Google cloud yesterday using the windows Linux subsystem. I think the problem was Chrome on windows could not write to /temp. I managed to get around this by installing chromium-browser, updating connect.py with this (/usr/bin/chromium-browser), then porting it through an Xming server. Hope this helps.