If you create a google_container_cluster resource and then add an extra google_container_node_pool resource to that cluster, and then run terraform plan again with _no changes to the configuration_, terraform will want to recreate the container cluster resource since it's now seeing 2 node pools in the cluster, whereas our definition of the cluster only specifies one since the second one was added after the cluster was created.
The issue seems to be located here, where it loops through all node pools regardless of if they were created by this resource:
v0.10.5-dev
That terraform is able to separate if a node pool was created, and thus managed by, a google_container_cluster resource or if it was added using a google_container_node_pool.
Terraform confuses a node pool created using a google_container_node_pool resource as belonging to the original cluster specification, believing that the current state is different from the spec.
Configure a cluster resource and a node pool resource which adds a pool to the cluster
resource "google_container_cluster" "primary" {
name = "marcellus-wallace"
zone = "us-central1-a"
initial_node_count = 3
}
resource "google_container_node_pool" "np" {
name = "my-node-pool"
zone = "us-central1-a"
cluster = "${google_container_cluster.primary.name}"
initial_node_count = 3
}
Then run these commands:
terraform applyterraform plan <- will want to force new cluster resourceHere's a super quick and dirty patch that produce the correct behaviour in our specific case. This is not an actual suggestion, I don't know what hidden assumptions this hack makes as I'm not familiar with the code base, but just wanted to find where in the code it wasn't doing the correct thing in our case.
diff --git a/google/resource_container_cluster.go b/google/resource_container_cluster.go
index aa9328c..e690546 100644
--- a/google/resource_container_cluster.go
+++ b/google/resource_container_cluster.go
@@ -629,7 +629,8 @@ func resourceContainerClusterUpdate(d *schema.ResourceData, meta interface{}) er
d.SetPartial("enable_legacy_abac")
}
- if n, ok := d.GetOk("node_pool.#"); ok {
+ if _, ok := d.GetOk("node_pool.#"); ok {
+ n, _ := d.GetChange("node_pool.#")
for i := 0; i < n.(int); i++ {
if d.HasChange(fmt.Sprintf("node_pool.%d.node_count", i)) {
newSize := int64(d.Get(fmt.Sprintf("node_pool.%d.node_count", i)).(int))
@@ -741,7 +742,9 @@ func getInstanceGroupUrlsFromManagerUrls(config *Config, igmUrls []string) ([]st
func flattenClusterNodePools(d *schema.ResourceData, config *Config, c []*container.NodePool) ([]map[string]interface{}, error) {
nodePools := make([]map[string]interface{}, 0, len(c))
- for i, np := range c {
+ n := d.Get("node_pool.#").(int)
+ for i := 0; i < n && i < len(c); i++ {
+ np := c[i]
// Node pools don't expose the current node count in their API, so read the
// instance groups instead. They should all have the same size, but in case a resize
// failed or something else strange happened, we'll just use the average size.
Are you close to doing triage on this? We're in pain over this issue right now.
Oh no, I don't want you to be in pain! I don't have a good answer for this- see https://github.com/terraform-providers/terraform-provider-google/issues/285#issuecomment-328235357 for why this is pretty tricky. As a stopgap, can you use lifecycle.ignore_changes? This week is part HashiConf part vacation, so I probably won't get a chance to look at the patch until next week.
Thanks @danawillow this somewhat improved the situation (avoiding accidental cluster recreation)
I added this to the google_container_cluster resource:
lifecycle {
ignore_changes = ["node_pool.#","node_pool.1.initial_node_count"]
}
@danawillow My apologies for not finding the issue you linked before creating this one. I've read the comments and understand that it's a complex issue, especially since it's unclear how it should work. But thanks a lot for the workaround, it helps us move on for now until a proper solution is found.
This issue is resolved; if no node pools are defined inline, Terraform won't manage the field in cluster.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
Oh no, I don't want you to be in pain! I don't have a good answer for this- see https://github.com/terraform-providers/terraform-provider-google/issues/285#issuecomment-328235357 for why this is pretty tricky. As a stopgap, can you use lifecycle.ignore_changes? This week is part HashiConf part vacation, so I probably won't get a chance to look at the patch until next week.