Hi,
I am launching dataproc cluster from using the Hailctl and i am not able to connect via ssh. This is due to the firewall restriction. Is there a way to pass ssh key when launching data proc via hailctl. ?
Hi,
I am launching dataproc cluster from using the Hailctl and i am not able to connect via ssh. This is due to the firewall restriction. Is there a way to pass ssh key when launching data proc via hailctl. ?
Hey @jerome-f !
hailctl dataproc cluster start
will pass any unrecognized arguments along to gcloud dataproc create
.
That said, if you’re having firewall issues, I don’t think an SSH key will fix the problem.
Thanks Danking,
So I am not able to connect/launch the notebook even though I am on the same network, or form a vm in the same instance. I dont know how to specify the cluster to inherit same firewall rules as the vm instance.
`Existing host keys found in /home/XXX/.ssh/google_compute_known_hosts`
`ssh: connect to host xx.xxx.xx.xxx port 22: Connection timed out` `Recommendation: To check for possible causes of SSH connectivity issues and get`
@jerome-f, I think you need to specify --network
to place the cluster in the network which has the desired firewall rules?
The gcloud
man page describes that flag and a few related ones:
--tags=TAG,[TAG,...]
Specifies a list of tags to apply to the instance. These tags allow
network firewall rules and routes to be applied to specified VM
instances. See gcloud compute firewall-rules create(1) for more
details.
To read more about configuring network tags, read this guide:
https://cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:
$ gcloud compute instances list \
--format='table(name,status,tags.list())'
To list instances tagged with a specific tag, tag1, run:
$ gcloud compute instances list --filter='tags:tag1'
At most one of these may be specified:
--network=NETWORK
The Compute Engine network that the VM instances of the cluster
will be part of. This is mutually exclusive with --subnet. If
neither is specified, this defaults to the "default" network.
--subnet=SUBNET
Specifies the subnet that the cluster will be part of. This is
mutally exclusive with --network.
Hi Danking,
The Dataproc instance is on the network behind the firewall. I could ssh into using my ssh key file. I might have to launch jupyter lab from within the cluster trying to use hailctl dataproc connect notebook
.
Ah, @jerome-f , I think I understand what you’re trying to do. Unfortunately, hailctl dataproc connect
does not pass through unrecognized arguments to gcloud dataproc
. I’ve marked this as a feature request. In the meantime, you can use --dry-run
to see the command we use. You can modify this command as necessary. In particular, I think you just need to add --ssh-key-file=path/to/your/key
(base) # hailctl dataproc connect --dry-run dk notebook
gcloud command:
compute ssh dking@dk-m --zone=us-central1-b \
'--ssh-flag=-D 10000' \
'--ssh-flag=-N' \
'--ssh-flag=-f' \
'--ssh-flag=-n'
Here’s a PR to add the pass through functionality you need. [hailctl] allow passthrough arguments to dataproc connect by danking · Pull Request #11710 · hail-is/hail · GitHub
Thanks for the reply Danking.
@danking I am not sure if this would be useful for everyone. But the current behavior of data proc create is to launch a Debian boot disk. My org does not like user to create custom images and requires to us to pick one of the hardened images pre provisioned (for data security etc. ) . hail dataproc create can have support to user specifying image with --image=
(it is working now as pass through but --image-version
is set so throws an error)