Passing SSH key

Hi,

I am launching dataproc cluster from using the Hailctl and i am not able to connect via ssh. This is due to the firewall restriction. Is there a way to pass ssh key when launching data proc via hailctl. ?

Hey @jerome-f !

hailctl dataproc cluster start will pass any unrecognized arguments along to gcloud dataproc create.

That said, if you’re having firewall issues, I don’t think an SSH key will fix the problem.

Thanks Danking,
So I am not able to connect/launch the notebook even though I am on the same network, or form a vm in the same instance. I dont know how to specify the cluster to inherit same firewall rules as the vm instance.

`Existing host keys found in /home/XXX/.ssh/google_compute_known_hosts`
`ssh: connect to host xx.xxx.xx.xxx port 22: Connection timed out` `Recommendation: To check for possible causes of SSH connectivity issues and get`

@jerome-f, I think you need to specify --network to place the cluster in the network which has the desired firewall rules?

The gcloud man page describes that flag and a few related ones:

       --tags=TAG,[TAG,...]
          Specifies a list of tags to apply to the instance. These tags allow
          network firewall rules and routes to be applied to specified VM
          instances. See gcloud compute firewall-rules create(1) for more
          details.

          To read more about configuring network tags, read this guide:
          https://cloud.google.com/vpc/docs/add-remove-network-tags

          To list instances with their respective status and tags, run:

              $ gcloud compute instances list \
                  --format='table(name,status,tags.list())'

          To list instances tagged with a specific tag, tag1, run:

              $ gcloud compute instances list --filter='tags:tag1'

       At most one of these may be specified:

         --network=NETWORK
            The Compute Engine network that the VM instances of the cluster
            will be part of. This is mutually exclusive with --subnet. If
            neither is specified, this defaults to the "default" network.

         --subnet=SUBNET
            Specifies the subnet that the cluster will be part of. This is
            mutally exclusive with --network.

Hi Danking,

The Dataproc instance is on the network behind the firewall. I could ssh into using my ssh key file. I might have to launch jupyter lab from within the cluster trying to use hailctl dataproc connect notebook.

Ah, @jerome-f , I think I understand what you’re trying to do. Unfortunately, hailctl dataproc connect does not pass through unrecognized arguments to gcloud dataproc. I’ve marked this as a feature request. In the meantime, you can use --dry-run to see the command we use. You can modify this command as necessary. In particular, I think you just need to add --ssh-key-file=path/to/your/key

(base) # hailctl dataproc connect --dry-run dk notebook
gcloud command:
compute ssh dking@dk-m --zone=us-central1-b \
    '--ssh-flag=-D 10000' \
    '--ssh-flag=-N' \
    '--ssh-flag=-f' \
    '--ssh-flag=-n'

Here’s a PR to add the pass through functionality you need. [hailctl] allow passthrough arguments to dataproc connect by danking · Pull Request #11710 · hail-is/hail · GitHub

Thanks for the reply Danking.

@danking I am not sure if this would be useful for everyone. But the current behavior of data proc create is to launch a Debian boot disk. My org does not like user to create custom images and requires to us to pick one of the hardened images pre provisioned (for data security etc. ) . hail dataproc create can have support to user specifying image with --image= (it is working now as pass through but --image-version is set so throws an error)

1 Like