Terraform-provider-google: google_container_cluster tries to recreate cluster always when used in combination with google_container_node_pool

Created on 26 Sep 2018  ·  13Comments  ·  Source: hashicorp/terraform-provider-google


Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.8

  • provider: google: 1.18

Affected Resource(s)

  • google_container_cluster
  • google_container_node_pool

Terraform Configuration Files

resource "google_container_cluster" "primary" {
  name               = "${var.cluster_name}"
  # If we want a regional cluster, should we be looking at https://cloud.google.com/kubernetes-engine/docs/concepts/regional-clusters#regional
  #  region = "${var.region}"
  zone               = "${var.main_zone}"
  additional_zones   = "${var.additional_zones}"
  # Node count for every region
  initial_node_count = 1
  project            = "${var.project}"
  remove_default_node_pool = true
  enable_legacy_abac = true

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_write",
      "https://www.googleapis.com/auth/sqlservice.admin",
      "https://www.googleapis.com/auth/cloud-platform",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
  addons_config {
    horizontal_pod_autoscaling {
      disabled = false
    }
  }
}

resource "google_container_node_pool" "nodepool" {
  name               = "${var.cluster_name}nodepool"
  zone               = "${var.main_zone}"
  cluster            = "${google_container_cluster.primary.name}"
  node_count         = "${var.node_count}"

  autoscaling {
    min_node_count = "${var.min_node_count}"
    max_node_count = "${var.max_node_count}"
  }
}

Debug Output

A lot of info in those logs to share them openly. Any tool there to anonimase them? Happy to share them if there is no sensitive data on them. I couldn't find much info about it.

Panic Output

It does not crash

Expected Behavior

Once applied succesfully, if I terraform plan again, no changes should be needed.

Actual Behavior

If right after applying the changes successfully, I terraform plan, I get:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

-/+ module.google.google_container_cluster.primary (new resource required)
      id:                                                    "dev" => <computed> (forces new resource)
      additional_zones.#:                                    "1" => "1"
      additional_zones.2873062354:                           "europe-west2-a" => "europe-west2-a"
      addons_config.#:                                       "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.#:          "1" => "1"
      addons_config.0.horizontal_pod_autoscaling.0.disabled: "false" => "false"
      addons_config.0.http_load_balancing.#:                 "0" => <computed>
      addons_config.0.kubernetes_dashboard.#:                "0" => <computed>
      addons_config.0.network_policy_config.#:               "1" => <computed>
      cluster_ipv4_cidr:                                     "10.20.0.0/14" => <computed>
      enable_binary_authorization:                           "false" => "false"
      enable_kubernetes_alpha:                               "false" => "false"
      enable_legacy_abac:                                    "true" => "true"
      endpoint:                                              "****" => <computed>
      initial_node_count:                                    "1" => "1"
      instance_group_urls.#:                                 "2" => <computed>
      logging_service:                                       "logging.googleapis.com" => <computed>
      master_auth.#:                                         "1" => <computed>
      master_version:                                        "1.9.7-gke.6" => <computed>
      monitoring_service:                                    "monitoring.googleapis.com" => <computed>
      name:                                                  "dev" => "dev"
      network:                                               "****" => "default"
      network_policy.#:                                      "1" => <computed>
      node_config.#:                                         "1" => "1"
      node_config.0.disk_size_gb:                            "100" => <computed>
      node_config.0.disk_type:                               "pd-standard" => <computed>
      node_config.0.guest_accelerator.#:                     "0" => <computed>
      node_config.0.image_type:                              "COS" => <computed>
      node_config.0.local_ssd_count:                         "0" => <computed>
      node_config.0.machine_type:                            "n1-standard-1" => <computed>
      node_config.0.oauth_scopes.#:                          "6" => "6"
      node_config.0.oauth_scopes.1277378754:                 "https://www.googleapis.com/auth/monitoring" => "https://www.googleapis.com/auth/monitoring"
      node_config.0.oauth_scopes.1328717722:                 "" => "https://www.googleapis.com/auth/devstorage.read_write" (forces new resource)
      node_config.0.oauth_scopes.1632638332:                 "https://www.googleapis.com/auth/devstorage.read_only" => "" (forces new resource)
      node_config.0.oauth_scopes.172152165:                  "https://www.googleapis.com/auth/logging.write" => "https://www.googleapis.com/auth/logging.write"
      node_config.0.oauth_scopes.1733087937:                 "" => "https://www.googleapis.com/auth/cloud-platform" (forces new resource)
      node_config.0.oauth_scopes.299962681:                  "" => "https://www.googleapis.com/auth/compute" (forces new resource)
      node_config.0.oauth_scopes.316356861:                  "https://www.googleapis.com/auth/service.management.readonly" => "" (forces new resource)
      node_config.0.oauth_scopes.3663490875:                 "https://www.googleapis.com/auth/servicecontrol" => "" (forces new resource)
      node_config.0.oauth_scopes.3859019814:                 "https://www.googleapis.com/auth/trace.append" => "" (forces new resource)
      node_config.0.oauth_scopes.4205865871:                 "" => "https://www.googleapis.com/auth/sqlservice.admin" (forces new resource)
      node_config.0.preemptible:                             "false" => "false"
      node_config.0.service_account:                         "default" => <computed>
      node_pool.#:                                           "1" => <computed>
      node_version:                                          "1.9.7-gke.6" => <computed>
      private_cluster:                                       "false" => "false"
      project:                                               "***" => "***"
      region:                                                "" => <computed>
      remove_default_node_pool:                              "true" => "true"
      zone:                                                  "europe-west2-b" => "europe-west2-b"


Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

This plan was saved to: devplan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "devplan.tfplan"

Steps to Reproduce

  1. terraform apply
    2 terraform apply again

    Important Factoids

This was not happening when using the default node pool. I started seeing the issue after using my own node pool instead, so I think it may be related to the node pool.

References

Maybe related to https://github.com/hashicorp/terraform/issues/18209 ?

  • #0000
documentation

Most helpful comment

I just tested this and I can confirm that the 'recommended' example destroys itself on every run of terraform apply even when not using the default pool

All 13 comments

Hmm, looking at that plan, what stands out to me is:

      node_config.0.oauth_scopes.#:                          "6" => "6"
      node_config.0.oauth_scopes.1277378754:                 "https://www.googleapis.com/auth/monitoring" => "https://www.googleapis.com/auth/monitoring"
      node_config.0.oauth_scopes.1328717722:                 "" => "https://www.googleapis.com/auth/devstorage.read_write" (forces new resource)
      node_config.0.oauth_scopes.1632638332:                 "https://www.googleapis.com/auth/devstorage.read_only" => "" (forces new resource)
      node_config.0.oauth_scopes.172152165:                  "https://www.googleapis.com/auth/logging.write" => "https://www.googleapis.com/auth/logging.write"
      node_config.0.oauth_scopes.1733087937:                 "" => "https://www.googleapis.com/auth/cloud-platform" (forces new resource)
      node_config.0.oauth_scopes.299962681:                  "" => "https://www.googleapis.com/auth/compute" (forces new resource)
      node_config.0.oauth_scopes.316356861:                  "https://www.googleapis.com/auth/service.management.readonly" => "" (forces new resource)
      node_config.0.oauth_scopes.3663490875:                 "https://www.googleapis.com/auth/servicecontrol" => "" (forces new resource)
      node_config.0.oauth_scopes.3859019814:                 "https://www.googleapis.com/auth/trace.append" => "" (forces new resource)
      node_config.0.oauth_scopes.4205865871:                 "" => "https://www.googleapis.com/auth/sqlservice.admin" (forces new resource)

So here's what I think's happening:

  • The node_config in the container_cluster is setting the scopes it wants all node pools to use.
  • The node pool you're adding has a default node_config
  • Terraform is getting confused about whether you want the node_config from the container_cluster or the default node_config from the node pool.

It's not perfect, but I believe if you move the node_config block from container_cluster into the node_pool, that confusion will be resolved.

I'll investigate and see if we can't come up with a better solution for this to make it work intuitively.

That actually makes a lot of sense. I didn't think about that.
I was just confused about the:
id: "europe-west2-b/dev/devnodepool" => (forces new resource)

Thank you very much! (yes, it does indeed fix the problem)

omg thanks for this, i've been banging my head with this for a few days ;)

So it sounds like we either have a documentation problem or a validation problem. I'm not 100% up to speed on the reason we have node_config at the cluster and node pool levels, so I'm not comfortable enough that I have all the use cases in mind to be able to say what the ideal solution is here, but I think we can improve this either through documentation or through not letting cluster set node_config, or through potentially handling an empty node_config on a node pool better. I'll leave this open so we can investigate those options.

@paddycarver the answer to your question is that the node_config on the cluster corresponds to the default node pool. The ideal solution would be that we would have a default_node_pool block on the cluster, but alas, that's not what the API gives us to work with. In the meantime, we can probably solve through documentation.

Wow, until this is resolved, a big fat warning should be added to the docs.

We advertise this as the recommended way to bootstrap a GKE cluster, yet recreate the cluster on every terraform apply.

Hey @flokli! Our recommendation is to use separately managed node pools and _not use the default node pool at all_.

If you specify a node_config block, you're telling Terraform you want to use the default node pool. That block was badly named by the API & by extension by the original implementation in Terraform. Despite the name omitting default_ prefix, it only applies to the default node pool.

As shown in the recommended example, node_config should be omitted and node_pool should be omitted.

@rileykarson if I copy that exact example:
https://www.terraform.io/docs/providers/google/r/container_cluster.html#example-usage-with-a-separately-managed-node-pool-recommended-

and terraform apply a second time, it'll destroy and recreate the whole cluster.

I just tested this and I can confirm that the 'recommended' example destroys itself on every run of terraform apply even when not using the default pool

The same is true of the other example using the default node pool, and neither is related to configuration of node pools. This is related to a breaking change from the GKE API where a default value was changed. Patching with https://github.com/GoogleCloudPlatform/magic-modules/pull/1844. See https://github.com/terraform-providers/terraform-provider-google/issues/3672 / #3369.

https://www.terraform.io/docs/providers/google/r/container_cluster.html#node_config is more clear about being used for the default node pool now. I don't think there's anything actionable to fix here, so I'm going to close this out. If anyone has anything unresolved and thinks this should be reopened, feel free to comment and I will.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings