GKE supports resizing an underlying compute cluster on the fly:
https://cloud.google.com/container-engine/docs/resize-cluster
google_container_cluster's initial_node_count describes the initial count; if you change this the cluster is destroyed/re-created.
The docs describe getting at the underlying managed instance group gke-example-b937f2ba-node.
How can we do this in Terraform?
@sparkprime do you know if this a limitation on Google's API or just the Terraform driver? I'm willing to contribute a patch if the latter.
@c4milo This is a limitation on the Terraform driver, I believe. Check this out: https://cloud.google.com/container-engine/docs/resize-cluster
Now run the following command to resize the instance group named gke-example-b937f2ba-group:
$ gcloud compute instance-groups managed resize gke-example-b937f2ba-group --zone us-central1-f --size 4
Updated [https://www.googleapis.com/compute/v1/projects/container-engine-docs/zones/us-central1-f/instanceGroupManagers/gke-example-b937f2ba-group].
---
baseInstanceName: gke-example-b937f2ba-node
creationTimestamp: '2015-07-28T11:34:03.119-07:00'
currentActions:
abandoning: 0
creating: 1
deleting: 0
none: 3
recreating: 0
refreshing: 0
restarting: 0
fingerprint: 42WmSpB8rSM=
id: '755751291075589620'
instanceGroup: gke-example-b937f2ba-group
instanceTemplate: gke-example-b937f2ba-1-0-1
kind: compute#instanceGroupManager
name: gke-example-b937f2ba-group
selfLink: https://www.googleapis.com/compute/v1/projects/container-engine-docs/zones/us-central1-f/instanceGroupManagers/gke-example-b937f2ba-group
targetSize: 4
zone: us-central1-f
You can poll the resize operation until the group is stable:
$ gcloud compute instance-groups managed wait-until-stable --zone us-central1-f gke-example-b937f2ba-group
Waiting for group to become stable, current operations: creating: 1
...
Group is stable
@c4milo Did you want to patch this?
Never mind. This actually isn't possible with the current Google API. Reference. The node pools aren't even updateable, either.
Also, gcloud is not open-source, so we can't even dig our way to the undocumented endpoint. The open source client libraries for e.g. go or Node.js does not support Google Container Engine. Maybe it's because it's an incomplete API for now.
Now the question becomes if Terraform is even a viable solution if you anticipate resizing your cluster in the near future. If I understand correctly, scaling it via gcloud would confuse Terraform, as it did not know about the extra nodes, because they are not stored in the state file. Is this correct? (ping @catsby)
@onbjerg, right, I looked into the API documentation and couldn't find any way to do that.
My opinion is if it's do-able from gcloud it should be do-able from the API at least. Not sure about whether the go bindings have a wrapper for that call but they should also be in sync.
I'm not up-to-date with GKE and k8s. So calling in @aronchick for that.
Context:
https://www.terraform.io/docs/providers/google/r/container_cluster.html
The user deploys the cluster using a config like that, but then changes initial_nodes and should be able to update the cluster without destroying it.
1) Is there an API to allow Terraform to implement that
2) Is changing initial_nodes the correct way to do this
Terraform is very good at managing collections of loosely coupled infrastructure components, including understanding the relationships between them especially WRT to lifecycles. So if the best thing to do from gke side is to assume a separate pool of instances managed by something else and just tell GKE to use it, then this is perfectly acceptable from the Terraform side.
@sparkprime I've practically digged through all of the Google searches and Google Cloud documentation I could, but there is nothing that indicates one could resize a cluster from the API. At least it's not documented then.
Hi --
Google Container Engine creates a Managed Instance Group of nodes, which you can then scale without destroying the cluster. The relevant APIs are here:
And here's a stackoverflow question describing exactly what you're looking to do via the command line (from last year, but still relevant):
FYI @onbjerg gcloud is a very thin wrapper over the top of the Google Cloud API - nearly every command on Google Cloud documentation has a "Console", "GCloud" and "API" tab above the example, so if you want to swap between commands, you just hit the tab.
Please let me know if I can answer any other questions!
So one approach is to bring a subset of MIG operations into the GKE resource in Terraform. But not all of them (e.g. you shouldn't be able to delete the MIG out of band, right?)
Another approach is grandfather the MIG as a Terraform-managed MIG but this is generally a painful user experience as it involves modifying Terraform's state with a text editor. Terraform expects to have created the things that it manages.
Is it possible to create your own MIG and then tell GKE to just use it? Or does GKE have to create the MIG itself?
This is typically referred to as "Bring Your Own Node" and is, unfortunately, not yet supported (though it's pretty straightforward to do if you want to).
Though not ideal, I'd probably recommend the former - have a very small subset of MIG operations that you can execute on a GKE cluster, and have the fact that it's a MIG be opaque from the end-user. To be clear, the fact that it's a MIG is just an implementation detail - we don't expect to have to have people care about that at all in the future.
Ok then @onbjerg you can get the MIG from instanceGroupUrls in the cluster resource and resize it using https://cloud.google.com/compute/docs/reference/latest/instanceGroupManagers/resize
@aronchick: would you mind providing examples of what you mean? Thanks!
Hi @roobert - an example of what? A MIG? Or how to bring your own node?
@aronchick, examples of both would be great if you don't mind, thanks!
Sorry, I'm still not clear. Did you just mean what is a managed instance group? (https://cloud.google.com/compute/docs/instance-groups/) What are you trying to do?
@aronchick: I'm slightly confused about two things:
1) how to use google_container_cluster to create a cluster which spans multiple zones
2) how to adjust the number of nodes in a cluster once the cluster has been deployed
If you could provide any kind of example of how to do either of these things I'd be grateful, I think the two techniques you mentioned earlier are related to the latter, but I can't quite work out how to apply them.
Thanks again,
1) To create a cluster that spans zones, first you create a cluster, and then you add node pools. (https://cloudplatform.googleblog.com/2016/05/introducing-Google-Container-Engine-GKE-node-pools.html)
2) To adjust the size of the pool (even if there's only one pool), you just resize your MIG (all node pools create a MIG):
https://cloud.google.com/compute/docs/instance-groups/#resize_managed_group
@aronchick: would it be possible to provide an example of what you mean using terraform resources? Thanks!
@roobert Hi -- I'm sorry, what are you looking for?
This is a managed instance group terraform resource: https://www.terraform.io/docs/providers/google/r/compute_instance_group_manager.html
I'm not sure there's a nodepool managed instance group resource.
@aronchick: ok, so it's not possible to use terraform in conjunction with GKE yet then, that's what I was wondering. I'll have a look at creating a nodepool resource, thanks.
@roobert how did you get on? I have this exact issue, i want to create a multiple zone node pool setup in GKE with Terraform
@Stono: unfortunately I gave up on using terraform to try to achieve this. I'm not sure whether the support is better now or whether the underlying components have changed in such a way that would make this easier but at the time, along with having difficulties doing this, I wasn't confident in being able to recover from terraform state issues.
If you're using GCP you may want to look at google cloud deployment manager - their own declarative offering. In the end we ended up writing a wrapper around gcloud which translates YAML to gcloud commands. gcloud seems like it's the best API for GCP since it's kept most up-to-date and seemed like the only way to get the flexibility we wanted.
I haven't tried this yet, but if support for node pool autoscaling were added to terraform would that solve the problem?
https://cloud.google.com/container-engine/reference/rest/v1/projects.zones.clusters.nodePools#NodePool.NodePoolAutoscaling
Then you could hypothetically set the min and max node size to the value you want.
hi @danawillow, although auto-scaling would be a great addition, I don't think it provides the same level of control discussed in this thread.
Hey @roobert, you mentioned y'all ended up building a wrapper around gcloud to translate YAML to gcloud commands. Could you by any chance share it? Just asking. Thanks.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Hi --
Google Container Engine creates a Managed Instance Group of nodes, which you can then scale without destroying the cluster. The relevant APIs are here:
And here's a stackoverflow question describing exactly what you're looking to do via the command line (from last year, but still relevant):