Terraform-provider-azurerm: azurerm_kubernetes_cluster: Adding node pools causes AKS cluster replacement

Created on 30 Jul 2019  ·  8Comments  ·  Source: terraform-providers/terraform-provider-azurerm

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

$ terraform -v
Terraform v0.12.5
+ provider.azurerm v1.32.0

Affected Resource(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "test" {
  name                = "test-k8s"
  resource_group_name = "test-rg"
  location            = "westus"
  dns_prefix          = "k8s"

  agent_pool_profile {
    name            = "poolone"
    count           = 1
    vm_size         = "Standard_B2ms"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }

  /*
  agent_pool_profile {
    name            = "pooltwo"
    count           = 1
    vm_size         = "Standard_B2ms"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }
  */

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    service_cidr       = "172.0.0.0/24"
    dns_service_ip     = "172.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
  }

  service_principal {
    client_id     = $MY_SP_ID
    client_secret = $MY_SP_SECRET
  }
}

Expected Behavior

A node pool should have been added to the existing AKS cluster without needing to destroy it first.

Actual Behavior

The entire AKS cluster is destroyed and recreated with the additional node pool

Steps to Reproduce

  1. Setup Terraform config for an AKS cluster with 1 node pool
  2. terraform apply
  3. Add an additional agent_pool_profile nested resource
  4. terraform apply
  5. Observe the Terraform plan wants to destroy and re-created the entire AKS cluster
  # azurerm_kubernetes_cluster.test must be replaced
-/+ resource "azurerm_kubernetes_cluster" "test" {
      - api_server_authorized_ip_ranges = [] -> null
        dns_prefix                      = "k8s"
ter apply)
      ~ kube_admin_config               = [] -> (known after apply)
      + kube_admin_config_raw           = (sensitive value)
      ~ kube_config                     = [
etc...
 ~ agent_pool_profile {
          - availability_zones  = [] -> null
            count               = 1
          + dns_prefix          = (known after apply)
          - enable_auto_scaling = false -> null
          ~ fqdn                = "k8s-1701e02c.hcp.westus.azmk8s.io" -> (known after apply)
          - max_count           = 0 -> null
          ~ max_pods            = 30 -> (known after apply)
          - min_count           = 0 -> null
            name                = "poolone"
          - node_taints         = [] -> null
            os_disk_size_gb     = 30
            os_type             = "Linux"
            type                = "AvailabilitySet"
            vm_size             = "Standard_B2ms"
        }
      + agent_pool_profile {
          + count           = 1
          + dns_prefix      = (known after apply)
          + fqdn            = (known after apply)
          + max_pods        = (known after apply)
          + name            = "pooltwo" # forces replacement
          + os_disk_size_gb = 30 # forces replacement
          + os_type         = "Linux" # forces replacement
          + type            = "AvailabilitySet" # forces replacement
          + vm_size         = "Standard_B2ms" # forces replacement
        }

Important Factoids

N/A

References

This issue looks related (Terraform replacing AKS nodepool cluster when changing VM count)
https://github.com/terraform-providers/terraform-provider-azurerm/issues/3835

bug servickubernetes-cluster

Most helpful comment

Can i ask humbly for some update of status or roadmap, when we could expect that feautre works correctly ? Thank You in advance,

All 8 comments

@titilambert can you add this to your list ?

when can we expect this to be fixed?

When is this likely to have a fix? It's the only thing in the way of Terraforming hybrid Linux/Windows clusters at the moment because it fails when trying to create the two nodepools during cluster creation. I can workaround it by creating the ARM template for the Windows nodepool directly, but then Terraform wants to re-create the cluster every time because it doesn't know about that nodepool.

I think altogether these problems are interweaved in issue #4001 as well. There are multiple issues in terraform at the moment it would help if we can consolidate questions there.

The AKS team is aware of these issues and while we work through the main feature we will try to provide proper guidance for TF as well across all of these.

In fact, if you use nodepools, even retrying an apply with the very same terraform configuration might end up trigerring a re-creation of the aks cluster.

We can reproduce this very easily by creating an aks cluster with 3 agent_pool_profile, letting the creation succeeds, and immediately re-do terraform apply.

If you are unlucky, the order in which azurerm retrieve the agent_pool_profile objects does not match you terraform source code, then it triggers a re-creation, because all objects are "modified".

This happens because the agent_pool_profile order in the terraform state is _always_ alphabetically sorted on the names.

You can avoid this bug by changing the order in your terraform source code, but it's not intuitive at all, and is still a bug from our point of view.

Versions used:

terraform -v
Terraform v0.12.6
+ provider.azurerm v1.32.1

Can i ask humbly for some update of status or roadmap, when we could expect that feautre works correctly ? Thank You in advance,

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings