Terraform-provider-google: Terraform can't handle GKE issue_client_certificate w/ K8S version 1.12

Created on 3 Apr 2019 · 13Comments · Source: hashicorp/terraform-provider-google

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.13
+ provider.external v1.1.0
+ provider.google v2.3.0
+ provider.google-beta v2.3.0
+ provider.kubernetes v1.5.2
+ provider.null v2.1.0
+ provider.random v2.1.0

Affected Resource(s)

google_container_cluster

Terraform Configuration Files

resource "google_container_cluster" "dev" {

  name                     = "dev"
  min_master_version       = "1.12.6-gke.7"
  location                 = "${var.region}"
  remove_default_node_pool = true
  initial_node_count       = 1
  network                  = "${var.network}"

  addons_config {
    network_policy_config {
      disabled = true
    }
  }

  ip_allocation_policy {
    use_ip_aliases = true
  }

  master_auth {
    username = "${var.username}"
    password = "${var.password}"
  }

  provisioner "local-exec" {
    command = "gcloud container clusters get-credentials ${self.name} --region ${self.location}"
  }
}

Debug Output

https://gist.github.com/orkenstein/f68f6a437d2e5057e5d798508f851c66

Panic Output

Nope

Expected Behavior

Cluster should not be changed

Actual Behavior

-/+ module.gke.google_container_cluster.dev (new resource required)

Steps to Reproduce

Create a cluster resource
Terraform apply
Notice cluster changes requested

Important Factoids

References

bug

Source

orkenstein

👍36

Most helpful comment

This fix will be released in 2.8.0 around Tuesday.

rileykarson on 31 May 2019

🚀3

All 13 comments

I am having exactly the same issue, i believe it is related with the following:

On 26th March there was a new release of terraform google cloud provider: https://github.com/terraform-providers/terraform-provider-google/releases/tag/v2.3.0
I took a look at the provider code and the following lines look suspect: https://github.com/terraform-providers/terraform-provider-google/blob/77c086de1c533e1ad4ea23a153d4266775c9ab2d/google/resource_container_cluster.go#L1726-L1730

joaosousafranco on 3 Apr 2019

Important detail seems to be that this only happens to recently created 1.12.6-gke.7 clusters.

Our preexisting 1.11.7-gke.12 clusters (one of which has afterwards been upgraded to said 1.12.6-gke.7) are (thankfully) not recreated.

Remz-Jay on 3 Apr 2019

Important detail seems to be that this only happens to recently created 1.12.6-gke.7 clusters.

Our preexisting 1.11.7-gke.12 clusters (one of which has afterwards been upgraded to said 1.12.6-gke.7) are (thankfully) not recreated.

I've tried to switch to latest, because of this: https://github.com/Azure/AKS/issues/273

orkenstein on 3 Apr 2019

This could well be related to https://github.com/terraform-providers/terraform-provider-google/issues/2183

leg100 on 3 Apr 2019

@joaosousafranco: Those lines have been present for 10 months, so I don't think it's them.

GKE's administration API (eg the GCP API the Google provider uses, and not the Kubernetes API) has different behaviour when different _Kubernetes_ versions are used at creation time. This is technically not a breaking change on their end, but super frustrating for API consumers like Terraform where we aren't enable to encode the rules of a whole other control plane's versioning system well.

K8S version 1.12 which was released recently was particularly bad about this, as many defaults were changed such as issuance of client certificates. That's the likely cause of the issue here; can you share what fields GKE is attempting to change @orkenstein?

rileykarson on 3 Apr 2019

@rileykarson these lines:

-/+ module.gke.google_container_cluster.dev (new resource required)
      id:                                                   "dev" => <computed> (forces new resource)
...
      master_auth.#:                                        "1" => "1"
      master_auth.0.client_certificate:                     "" => <computed>
      master_auth.0.client_certificate_config.#:            "1" => "0" (forces new resource)
      master_auth.0.client_key:                             <sensitive> => <computed> (attribute changed)
...

and these:

  ~ module.gke.google_container_cluster.dev
      network:                                                                      "projects/<project>/global/networks/default" => "default"

orkenstein on 3 Apr 2019

Is there a workaround, besides

lifecycle {
  ignore_changes = ["master_auth", "network"]·
}

roidelapluie on 16 Apr 2019

👍1

There are two workarounds for the issue where Terraform is causing a recreate. The recreate is a problem with how the provider a diff on master_auth.client_certificate_config. As identified above, defining this works (it causes Terraform to ignore diffs on master_auth and children entirely):

lifecycle {
  ignore_changes = ["master_auth"]
}

As well, if you've specified a master_auth block, you can explicitly specify a false issue_client_certificate, eg:

master_auth {
  client_certificate_config {
    issue_client_certificate = false
  }
}

Finally, if you use a min_master_version of 1.11.X or lower, Terraform should work as intended.

To provide more context on why this + #3240 are happening, as stated before, GKE published K8S 1.12 which changed the behaviour of the GKE API. Both issues are caused because they changed the issue_client_certificate field, and they're very technically not breaking changes- the old behaviour still works if you use an older K8S version. That means everything should work fine using 1.11.X or earlier.

This is a case that Terraform providers aren't great at handling. Provider schema typing is defined in code, and we need to make a code change + new provider release to solve this. A solution to one of this or #3240 needs to solve the other at the same time, or we're just going to have to make more similar changes possibly breaking users again so I'm consolidating both issues here.

When implementing this feature initially, because of how Terraform's diff engine behaves, we had to add shims at various levels to make this value appear correct for users at plan time. That included how we interpreted null values in the API.

The change in defaults means that the meaning of the null value has changed, and that's caused #3240- the provider is currently only able to send requests with a null value (previously an implicit enablement of client certs) or an explicit disablement.

In addition, the provider considers enabled values (issue_client_certificate = true equivalent to nulls right now, which... complicates things. That's part of the reason for this issue (#3369), as that shim was sufficient to solve block-level diffs. Previously, this assumption was true.

We'll attempt to massage Terraform so that clusters with either pre- or post- 1.12 act sensibly, and while preserving behaviour for 1.11 users. At first glance, it's likely we'll end up setting a Terraform-specific default to always enable client certs that we flip to disabled in version 3.0.0 of the provider.

rileykarson on 27 Apr 2019

👍1

This fix will be released in 2.8.0 around Tuesday.

rileykarson on 31 May 2019

🚀3

It would be better to not close issues before they are released..

matti on 2 Jun 2019

2.8.0 does not seem to resolve the problem. Still getting the error

james-knott on 21 Jun 2019

@james-knott can you file a new issue including a config and debug logs?

rileykarson on 21 Jun 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!