Use Private IP Address on Dataproc Cluster

tyleryath · September 17, 2020, 5:35pm

I’ve used the hailctl dataproc start command along with --no-address and --subnet to try to initialize a dataproc cluster that only uses internal ip addresses in order to avoid hitting the in-use ip address quota limit on GCP.

I believe cluster creation failed because it was unable to connect to the internet to install the required packages due to the use of private ip addresses. Does anybody have any ideas on how to get around this?

danking · September 17, 2020, 6:33pm

I’ve never used it, but I think Cloud NAT is designed to provide access to the internet for VMs without external IP addresses.

I’m also pretty sure only the leader node needs access to the internet. Is it possible to only set the workers to use internal IP addresses?

johnc1231 · September 17, 2020, 6:34pm

I’ve never heard of anyone bumping into that quota, and we have made some pretty huge clusters. It’s possible the easiest solution might be to ask Google to up your quota, but I don’t know the details of your organization.

You could probably get away with downloading the dependencies into Google Storage and then writing a new init script that reads them from there, but that won’t be a pleasant experience either, and will not be fun to maintain.

It’s also worth noting that only the driver node of the cluster needs to install anything from the internet, so if you could configure things so only the leader node has a public ip, this will probably work.

tyleryath · September 17, 2020, 8:42pm

I’ll look into Cloud NAT! And using internal IP addresses for the worker nodes sounds like a promising idea as well. Thank you!

tyleryath · September 17, 2020, 8:44pm

Yeah for some reason in our organization it seems to be our most common resource limit. We’ve asked Google to up our quota, but they denied and suggested we look into using internal IP addresses. The suggestion to configure only the driver node with a public IP address sounds like a good place to start. Thank you for your help!

Topic		Replies	Views
Gcloud.dataproc.clusters.create: failed: Cannot start master Hail Query & hailctl	2	360	October 20, 2023
How to create a cluster with 8 cpus and 0 preemptible Hail Query & hailctl	6	1427	May 10, 2020
Cannot start dataproc cluster because network is not found Hail Query & hailctl	1	327	January 2, 2024
Permission denied (publickey) when trying to connect Dataproc notebook with hailctl Hail Query & hailctl	1	649	February 14, 2021
I am unable to start a new Hail dataproc cluster Hail Query & hailctl	1	33	January 21, 2025

Use Private IP Address on Dataproc Cluster

Related topics