Terraform-provider-google: Adding google_container_node_pool to a google_container_cluster will force new cluster resource in next plan

Created on 12 Sep 2017  ·  7Comments  ·  Source: hashicorp/terraform-provider-google

If you create a google_container_cluster resource and then add an extra google_container_node_pool resource to that cluster, and then run terraform plan again with _no changes to the configuration_, terraform will want to recreate the container cluster resource since it's now seeing 2 node pools in the cluster, whereas our definition of the cluster only specifies one since the second one was added after the cluster was created.

The issue seems to be located here, where it loops through all node pools regardless of if they were created by this resource:

https://github.com/terraform-providers/terraform-provider-google/blob/master/google/resource_container_cluster.go#L633

Terraform Version

v0.10.5-dev

Affected Resource(s)

  • google_container_cluster
  • google_container_node_pool

Expected Behavior

That terraform is able to separate if a node pool was created, and thus managed by, a google_container_cluster resource or if it was added using a google_container_node_pool.

Actual Behavior

Terraform confuses a node pool created using a google_container_node_pool resource as belonging to the original cluster specification, believing that the current state is different from the spec.

Steps to Reproduce

Configure a cluster resource and a node pool resource which adds a pool to the cluster

resource "google_container_cluster" "primary" {
  name               = "marcellus-wallace"
  zone               = "us-central1-a"
  initial_node_count = 3
}

resource "google_container_node_pool" "np" {
  name               = "my-node-pool"
  zone               = "us-central1-a"
  cluster            = "${google_container_cluster.primary.name}"
  initial_node_count = 3
}

Then run these commands:

  1. terraform apply
  2. terraform plan <- will want to force new cluster resource
bug

Most helpful comment

Oh no, I don't want you to be in pain! I don't have a good answer for this- see https://github.com/terraform-providers/terraform-provider-google/issues/285#issuecomment-328235357 for why this is pretty tricky. As a stopgap, can you use lifecycle.ignore_changes? This week is part HashiConf part vacation, so I probably won't get a chance to look at the patch until next week.

All 7 comments

Here's a super quick and dirty patch that produce the correct behaviour in our specific case. This is not an actual suggestion, I don't know what hidden assumptions this hack makes as I'm not familiar with the code base, but just wanted to find where in the code it wasn't doing the correct thing in our case.

diff --git a/google/resource_container_cluster.go b/google/resource_container_cluster.go
index aa9328c..e690546 100644
--- a/google/resource_container_cluster.go
+++ b/google/resource_container_cluster.go
@@ -629,7 +629,8 @@ func resourceContainerClusterUpdate(d *schema.ResourceData, meta interface{}) er
        d.SetPartial("enable_legacy_abac")
    }

-   if n, ok := d.GetOk("node_pool.#"); ok {
+   if _, ok := d.GetOk("node_pool.#"); ok {
+       n, _ := d.GetChange("node_pool.#")
        for i := 0; i < n.(int); i++ {
            if d.HasChange(fmt.Sprintf("node_pool.%d.node_count", i)) {
                newSize := int64(d.Get(fmt.Sprintf("node_pool.%d.node_count", i)).(int))
@@ -741,7 +742,9 @@ func getInstanceGroupUrlsFromManagerUrls(config *Config, igmUrls []string) ([]st
 func flattenClusterNodePools(d *schema.ResourceData, config *Config, c []*container.NodePool) ([]map[string]interface{}, error) {
    nodePools := make([]map[string]interface{}, 0, len(c))

-   for i, np := range c {
+   n := d.Get("node_pool.#").(int)
+   for i := 0; i < n && i < len(c); i++ {
+       np := c[i]
        // Node pools don't expose the current node count in their API, so read the
        // instance groups instead. They should all have the same size, but in case a resize
        // failed or something else strange happened, we'll just use the average size.

Are you close to doing triage on this? We're in pain over this issue right now.

Oh no, I don't want you to be in pain! I don't have a good answer for this- see https://github.com/terraform-providers/terraform-provider-google/issues/285#issuecomment-328235357 for why this is pretty tricky. As a stopgap, can you use lifecycle.ignore_changes? This week is part HashiConf part vacation, so I probably won't get a chance to look at the patch until next week.

Thanks @danawillow this somewhat improved the situation (avoiding accidental cluster recreation)
I added this to the google_container_cluster resource:

  lifecycle {
    ignore_changes = ["node_pool.#","node_pool.1.initial_node_count"]
  }

@danawillow My apologies for not finding the issue you linked before creating this one. I've read the comments and understand that it's a complex issue, especially since it's unclear how it should work. But thanks a lot for the workaround, it helps us move on for now until a proper solution is found.

This issue is resolved; if no node pools are defined inline, Terraform won't manage the field in cluster.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings