terraform
version: 0.12.8azurerm provider
version: 1.41azurerm_kubernetes_cluster
resource "azurerm_kubernetes_cluster" "k8s" {
name = var.cluster_info.name
location = azurerm_resource_group.k8s.location
dns_prefix = var.cluster_info.dns_prefix
resource_group_name = azurerm_resource_group.k8s.name
kubernetes_version = var.kubernetes_version
role_based_access_control {
enabled = ! local.aks_aad_skip_rbac
dynamic "azure_active_directory" {
for_each = "${! local.aks_aad_skip_rbac ? list(local.aks_rbac_setting) : []}"
content {
client_app_id = local.aks_rbac_setting.client_app_id
server_app_id = local.aks_rbac_setting.server_app_id
server_app_secret = local.aks_rbac_setting.server_app_secret
tenant_id = local.aks_rbac_setting.tenant_id
}
}
}
default_node_pool {
name = var.agent_pool.name
node_count = var.agent_pool.count
vm_size = "Standard_DS2_v2"
type = "VirtualMachineScaleSets"
os_disk_size_gb = 30
max_pods = 30
availability_zones = local.sanitized_availability_zones
enable_auto_scaling= true
min_count = 3
max_count = 12
}
service_principal {
client_id = var.aks_login.client_id
client_secret = var.aks_login.client_secret
}
addon_profile {
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.k8s_logs.id
}
}
tags = {
CreatedBy = var.tags.created_by != "" ? var.tags.created_by : null
ChangedBy = var.tags.changed_by != "" ? var.tags.changed_by : null
Environment = var.tags.environment != "" ? var.tags.environment : null
}
lifecycle {
ignore_changes = [
role_based_access_control,
role_based_access_control["azure_active_directory"],
agent_pool_profile.0.count]
}
network_profile {
network_plugin = "azure"
load_balancer_sku = var.aks_load_balancer_sku
}
}
With azurerm_provider == 1.39
, the kubelet version in our cluster's node pool would be updated (in addition to the AKS k8s version) by changing var.kubernetes_version
### Actual Behavior
With azurerm_provider == 1.41
, the kubelet version in our cluster's node pool is not updated (in addition to the AKS k8s version) by changing var.kubernetes_version
azurerm_provider == 1.39
and doing terraform apply
seems to perform the expected behavior, even if the exact same config was previously run with azurerm_provider == 1.41
.Just checked -- the issue also occurs in azurerm_provider == 1.40
.
EDIT: I take it back. I've been git bisect
ing, and v1.40.0
doesn't seem to repro the issue when built myself. I'm starting over from v1.39.0..v1.41.0
.
EDIT2: Double-takeback, there's a bug in my testbed. Re-starting.
A git bisect v1.39.0..v1.40.0
points to 87de8aec2f9eb9c23ff080bbcd6d68abc6fe82ea as the cause of issue. The code changes in that commit seem unrelated, so I'd guess it has to do with the azure-sdk
container-service
version bump.
I'm having trouble finding canon on what changed between azure-sdk/container-service 20190601
and 20191001
, so I checked what az cli
does on an az aks nodepool upgrade
. Looks like OrchestratorVersion
is settable when updating agent pools, so that's probably the fix.
If I were to guess, multiple agent pools is a new feature in AKS, so that probably drove the agent pool k8s upgrade logic's separation from the AKS k8s upgrade logic.
@tombuildsstuff I'd like to contribute the PR, but would appreciate some guidance.
Would we rather
OrchestratorVersion
field on agent pools individually?Does terraform-provider-azurerm v1.x
supports multiple agent pools for AKS?
@jstevans
A git bisect v1.39.0..v1.40.0 points to 87de8ae as the cause of issue. The code changes in that commit seem unrelated, so I'd guess it has to do with the azure-sdk container-service version bump.
Yeah this is likely a change in behaviour between the different versions of the Container Service API
@tombuildsstuff I'd like to contribute the PR, but would appreciate some guidance.
Would we rather
- keep the old behavior for now, of coupling the agent pool k8s version to the cluster k8s version; or
- support configuring the OrchestratorVersion field on agent pools individually?
Does terraform-provider-azurerm v1.x supports multiple agent pools for AKS?
๐ we support multiple node pools via the azurerm_kubernetes_cluster_node_pool
resource - however there's a default node pool defined within the azurerm_kubernetes_cluster
resource. As such I'm wondering if the behaviour should be:
Alternatively we can make them all configurable - but since the default node pool's hosting system jobs it's treated a little differently to the other node pools so this probably wants some testing to confirm which way to go - maybe @jluk can confirm the expected behaviour here?
The ruleset between control plane and agent pools are defined in this public document. There is a window of config drift you are allowed to have between the control plane and each agent pool.
https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#upgrade-a-cluster-control-plane-with-multiple-node-pools
Rules for valid versions to upgrade node pools:
The node pool version must have the same major version as the control plane.
The node pool minor version must be within two minor versions of the control plane version.
The node pool version cannot be greater than the control major.minor.patch version.
Hope this helps.
Great, thanks @jluk :) @tombuildsstuff:
So that I can unblock my team's upgrade to [email protected]
while we continue this discussion, can I follow your first option to have default_node_pool.OrchestratorVersion := kubernetes_cluster.KubernetesVersion
?
For the longer-term, what if kubernetes_cluster.default_node_pool
and kubernetes_cluster_node_pool
both have an optional string
argument orchestrator_version
which defaults to kubernetes_cluster.kubernetes_version
?
Do the version drift rules between control plane vs. agent pool need to be encoded on the client? Is it insufficient to make the REST call and let it fail?
@tombuildsstuff I'm going to work on this tomorrow according to my previous comment, please shout if that's the wrong thing to do ๐
@jstevans
So that I can unblock my team's upgrade to [email protected] while we continue this discussion, can I follow your first option to have default_node_pool.OrchestratorVersion := kubernetes_cluster.KubernetesVersion?
That sounds fine for now, however the field doesn't want exposing to users currently and instead wants to be an internal behaviour, until..
For the longer-term, what if kubernetes_cluster.default_node_pool and kubernetes_cluster_node_pool both have an optional string argument orchestrator_version which defaults to kubernetes_cluster.kubernetes_version?
Do the version drift rules between control plane vs. agent pool need to be encoded on the client? Is it insufficient to make the REST call and let it fail?
Adding properties for kubernetes_version
or similar to each node pool seems fine - however we may need to add locks into the azurerm_kubernetes_cluster
and azurerm_kubernetes_cluster_node_pool
resources to account for this (we'd need to test upgrading a major version and the node pools at the same time and confirm what happens, we may be able to get away with locking on the node pool names in each resources rather than the name of the kubernetes_cluster - since this'd allow still creating/deleting each node pool in parallel). Realistically we'll only know which thing to lock on (and if this is even needed) by testing it however.
Hope that helps :)
Coupling by default could work, as long as it can be disabled. Upgrading control plane and default node pool with one terraform apply could be very impactful to a cluster. I'm commenting to vote on exposing OrchestratorVersion for default node pool as well as the node pool resource.
I would rather have OrchestratorVersion exposed ASAP without the proper locking even if it means I can control it. My current plans are to use the GRAPH API directly to upgrade node pools. https://docs.microsoft.com/en-us/rest/api/aks/agentpools/createorupdate
I'm available (with Azure resources as well) to test potential patches and I am able to write golang, but I lack terraform internals experience. Let me know how I can help.
Using azurerm provider at version "=2.1.0", I've upgraded an azurerm_kubernetes_cluster resource from 1.14 to 1.15. The Control Plane has seemed to upgrade to 1.15, but the VMSS node pool has stayed behind at 1.14.
What is the current expected behavior Kubernetes upgrade and their node pools? I understand (https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#validation-rules-for-upgrades) expresses certain validation conditions.
Will the node pool upgrade when it's obligated to? or can we control this as @jstevans suggested with an input?
I'd like to know this!
@tombuildsstuff ๐
any info on the above ?
This has been released in version 2.14.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:
provider "azurerm" {
version = "~> 2.14.0"
}
# ... other configuration ...
Hello,
I am having the same issue. I tried to use the provider 2.14.0 or the 2.16.0 and even though my cluster upgraded correctly, the nodepools are still in the old version (1.15.10)...
Have you set the new orchestrator_version
property of the (default) nodepool(s)?
@EPinci Hello, I haven't. Lemme try!
Thanks for the heads up.
@EPinci Hey man, really appreciate the quick comment!
I set the orchestrator_version = my AKS version and it worked just fine!
Sorry for the dumb mistake.
Cheers,
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐ค ๐ , please reach out to my human friends ๐ [email protected]. Thanks!
Most helpful comment
Using azurerm provider at version "=2.1.0", I've upgraded an azurerm_kubernetes_cluster resource from 1.14 to 1.15. The Control Plane has seemed to upgrade to 1.15, but the VMSS node pool has stayed behind at 1.14.
What is the current expected behavior Kubernetes upgrade and their node pools? I understand (https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#validation-rules-for-upgrades) expresses certain validation conditions.
Will the node pool upgrade when it's obligated to? or can we control this as @jstevans suggested with an input?