The only supported fields in the workload_identity_config are identity_namespace
https://github.com/terraform-providers/terraform-provider-google-beta/blob/3b46dc33a3d0ed8df968546462fa0f4908597e7d/google-beta/resource_container_cluster.go#L704-L716
There's no enabled field within workload_identity_config block to default to.
When using one module each for a GKE cluster with the same source, where the source is a "google_container_cluster" resource, we are unable to conditionally enable workload_identity_config it for a few clusters.
The only way we could conditionally add the workload_identity_config block to the google_container_cluster resource is by using dynamic-blocks.
# Workload Identity allows Kubernetes service accounts to act as a user-managed Google IAM Service Account.
dynamic "workload_identity_config" {
for_each = var.workload_identity_enabled ? list(var.cluster_project) : []
content {
# Currently, the only supported identity namespace is the project's default.
identity_namespace = "${var.cluster_project}.svc.id.goog"
}
}
}
workload_identity_config - https://www.terraform.io/docs/providers/google/r/container_cluster.html#workload_identity_config
provider "google-beta" with version >= 2.12.0
Add an enabled field to the workload_identity_config block.
"workload_identity_config": {
Type: schema.TypeList,
MaxItems: 1,
Optional: true,
Elem: &schema.Resource{
Schema: map[string]*schema.Schema{
"enabled": {
Type: schema.TypeBool,
Required: true,
},
"identity_namespace": {
Type: schema.TypeString,
Required: true,
},
},
},
},
I too have been experimenting with the above approach to enabling workload identity in a shared module used to build multiple clusters and have noted some unexpected behaviour when testing various scenarios, which I thought might be helpful to post here.
workload_identity_config block entirely _does_ disable workload identity, however:identity_namespace = "" when workload identity has been switched off and attempts to remove the workload_identity_config block forevermoreidentity_namespace = "" to disable workload identity on a cluster that has previously had it enabled, and this seems to work and Terraform proposes no further changeidentity_namespace = "" on a cluster that has _never_ had workload identity previously enabled fails with: Error: googleapi: Error 400: Must specify a field to update., badRequestworkload_identity_config block must remain absent from the configuration for a cluster which has never had workload identity enabledidentity_namespace = null does not appear to be a valid approach eitherSo, in a common GKE cluster module, in order to support all state transitions, we appear to need a three-state approach:
| Desired workload identity configuration | Cluster has had workload identity enabled in the past? | Required config |
|---|---|---|
| Enabled | No | identity_namespace = "<namespace>" |
| Disabled | No | workload_identity_config block entirely absent |
| Enabled | Yes | identity_namespace = "<namespace>" |
| Disabled | Yes | identity_namespace = "" |
Which leads us to having something like the following dynamic block inside the google_container_cluster resource:
dynamic "workload_identity_config" {
for_each = var.enable_workload_identity == null ? [] : [0]
content {
identity_namespace = var.enable_workload_identity == true ? "${var.project_id}.svc.id.goog" : ""
}
}
Which has a number of pitfalls which must be documented for the unwary:
| var.enable_workload_identity | Remarks |
|---|---|
| Set to null | โ
No changes proposed; workload identity left disabled |
| null -> false | โ Fails with Error: googleapi: Error 400: Must specify a field to update., badRequest |
| null -> true | โ
Enables workload identity as expected |
| var.enable_workload_identity | Remarks |
|---|---|
| true to false | โ
Disables workload identity |
| true -> null | โ Disables workload identity, but proposes change on every subsequent plan |
| null -> true | โ
Enables workload identity as expected |
| null -> false | โ
Makes the perpetual change proposals go away, leaves workload identity disabled |
| false -> true | โ
Enables workload identity as expected |
| false -> null | โ Leaves workload identity disabled, but results in endless proposed changes |
This appears to all stem from the fact that having disabled workload identity on a cluster, an empty workloadIdentityConfig section gets returned from the API, whereas this does not exist for a cluster that has never had the feature enabled.
google_container_node_pool issuesThought I'd note this here, although I guess this may warrant a separate issue?
Once workload identity is enabled in a cluster, new node pools, by default, have the GKE Metadata Server enabled.
So when creating a new node pool with the google_container_node_pool resource following the enablement of workload identity on the cluster, if no workload_metadata_config block is specified, the resultant node pool gets created with node_metadata = GKE_METADATA_SERVER anyway. On subsequent plans when Terraform refreshes the state of the google_container_node_pool resource it sees that the workload_metadata_config section is present and tries to remove it on every subsequent plan.
So, when deploying new node pools onto a cluster with workload identity enabled, the workload_metadata_config block and the node_metadata setting are effectively not optional, otherwise Terraform erroneously proposes changes.
In addition, using the node_metadata = "UNSPECIFIED" value always seems to result in Terraform repeatedly proposing changes, because the actual underlying nodeMetadata setting gets set to either EXPOSE or GKE_METADATA_SERVER depending on the cluster configuration.
The two failing cases under Cluster having had workload identity previously enabled seem like a clear-cut bug. I'll see if I can handle those next week.
That may involve exposing enabled and may not- if it doesn't, I can re-triage this issue as an enhancement.
Hi, any word on this? This is still a problem in version 3.46.0.
It's been a while, but this was a bigger problem than I expected, if I remember right. I'm triaging this as a persistent-bug, which means we'll pick it up in our triage process as if it was an enhancement request.
This issue may need to be amended as the API has changed the way Workload Identity is configured.
https://github.com/hashicorp/terraform-provider-google/issues/8129
https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster.WorkloadIdentityConfig
Most helpful comment
I too have been experimenting with the above approach to enabling workload identity in a shared module used to build multiple clusters and have noted some unexpected behaviour when testing various scenarios, which I thought might be helpful to post here.
workload_identity_configblock entirely _does_ disable workload identity, however:identity_namespace = ""when workload identity has been switched off and attempts to remove theworkload_identity_configblock forevermoreidentity_namespace = ""to disable workload identity on a cluster that has previously had it enabled, and this seems to work and Terraform proposes no further changeidentity_namespace = ""on a cluster that has _never_ had workload identity previously enabled fails with:Error: googleapi: Error 400: Must specify a field to update., badRequestworkload_identity_configblock must remain absent from the configuration for a cluster which has never had workload identity enabledidentity_namespace = nulldoes not appear to be a valid approach eitherSo, in a common GKE cluster module, in order to support all state transitions, we appear to need a three-state approach:
| Desired workload identity configuration | Cluster has had workload identity enabled in the past? | Required config |
|---|---|---|
| Enabled | No |
identity_namespace = "<namespace>"|| Disabled | No |
workload_identity_configblock entirely absent || Enabled | Yes |
identity_namespace = "<namespace>"|| Disabled | Yes |
identity_namespace = ""|Which leads us to having something like the following dynamic block inside the
google_container_clusterresource:Which has a number of pitfalls which must be documented for the unwary:
Cluster having not had workload identity previously enabled
|
var.enable_workload_identity| Remarks ||---|---|
| Set to
null| โ No changes proposed; workload identity left disabled ||
null->false| โ Fails withError: googleapi: Error 400: Must specify a field to update., badRequest||
null->true| โ Enables workload identity as expected |Cluster having had workload identity previously enabled
|
var.enable_workload_identity| Remarks ||---|---|
|
truetofalse| โ Disables workload identity ||
true->null| โ Disables workload identity, but proposes change on every subsequent plan ||
null->true| โ Enables workload identity as expected ||
null->false| โ Makes the perpetual change proposals go away, leaves workload identity disabled ||
false->true| โ Enables workload identity as expected ||
false->null| โ Leaves workload identity disabled, but results in endless proposed changes |This appears to all stem from the fact that having disabled workload identity on a cluster, an empty
workloadIdentityConfigsection gets returned from the API, whereas this does not exist for a cluster that has never had the feature enabled.Semi-related
google_container_node_poolissuesThought I'd note this here, although I guess this may warrant a separate issue?
Once workload identity is enabled in a cluster, new node pools, by default, have the GKE Metadata Server enabled.
So when creating a new node pool with the
google_container_node_poolresource following the enablement of workload identity on the cluster, if noworkload_metadata_configblock is specified, the resultant node pool gets created withnode_metadata = GKE_METADATA_SERVERanyway. On subsequent plans when Terraform refreshes the state of thegoogle_container_node_poolresource it sees that theworkload_metadata_configsection is present and tries to remove it on every subsequent plan.So, when deploying new node pools onto a cluster with workload identity enabled, the
workload_metadata_configblock and thenode_metadatasetting are effectively not optional, otherwise Terraform erroneously proposes changes.In addition, using the
node_metadata = "UNSPECIFIED"value always seems to result in Terraform repeatedly proposing changes, because the actual underlyingnodeMetadatasetting gets set to eitherEXPOSEorGKE_METADATA_SERVERdepending on the cluster configuration.