Terraform-provider-azurerm: azure_kubernetes_cluster update issue error when moving from agent_node_profile to default_node_pool

Created on 29 Jan 2020 · 8Comments · Source: terraform-providers/terraform-provider-azurerm

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

terraform -v

Terraform v0.12.20
+ provider.azuread v0.7.0
+ provider.azurerm v1.42.0
+ provider.random v2.2.1

Affected Resource(s)

azurerm_kubernetes_clluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  ...

  # agent_pool_profile {
  #   type            = "AvailabilitySet"
  #   name            = "default"
  #   count           = var.node_count
  #   vm_size         = var.node_size
  #   os_type         = "Linux"
  #   os_disk_size_gb = var.node_disk_size
  #   vnet_subnet_id  = var.subnet_id
  #   max_pods        = var.node_max_pods
  # }

  default_node_pool {
    type                = "AvailabilitySet"
    name                = "default"
    enable_auto_scaling = false
    node_count          = var.node_count
    vm_size             = var.node_size
    os_disk_size_gb     = var.node_disk_size
    vnet_subnet_id      = var.subnet_id
    max_pods            = var.node_max_pods
  }
  ...
}

Debug Output

Output for terraform plan: https://gist.github.com/jleloup/f57e0b9f8b35f3158c12645c37ad5f1f

Panic Output

Expected Behavior

I expect my AKS cluster to be updated using the newer syntax default_node_pool instead of the previous agent_pool_profile.
To trigger such an update I have increase the size of my AKS cluster by one node (from 50 to 51).

Actual Behavior

Terraform plan does not indicates that the agent_pool_profile will be deleted
Terraform apply outputs an error:

Error: Error updating Default Node Pool "kubernetes" (Resource Group "kubernetes-layer"): containerservice.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="AgentPoolAPIsNotSupported" Message="AgentPool APIs supported only for clusters with VMSS agentpools. For more information, please check https://aka.ms/multiagentpoollimitations"

  on topology.tf line 33, in resource "azurerm_kubernetes_cluster" "aks_cluster":
  33: resource "azurerm_kubernetes_cluster" "aks_cluster" {

AFAK this issue is an indication that somehow we are updating this AKS cluster with two node pools which is not supported with our current setup (Availability Sets).

Steps to Reproduce

terraform plan
terraform apply

Any attempts to modify the Terraform state to remove the AKS cluster & import it again is leading to the same situation again.
Also trying to revert this modification (getting back to agent_node_profile) leads to another issue (Terraform wants to delete & recreate the cluster) which is not helping.

Important Factoids

We are running in Azure West Europe region
AKS cluster version: 1.14.8

References

bug servickubernetes-cluster

Source

jleloup

👍17

Most helpful comment

One thing worth mentioning also: I have tried to use the ignore_changes statement to exclude either the agent_profile_node or the default_node_profile though it didn't made any difference.

Also removing either of this sections from the Terraform state manually (pulling, update then pushing) is not helping either: as soon as I do a Terraform refresh or any command that triggers a refresh: I always end up having both sections in my terraform state & then stumbling on the symptoms I have described in the issue.

jleloup on 31 Jan 2020

👍7

All 8 comments

One thing worth mentioning also: I have tried to use the ignore_changes statement to exclude either the agent_profile_node or the default_node_profile though it didn't made any difference.

jleloup on 31 Jan 2020

👍7

Did someone had the time to look at this issue ?

jleloup on 27 Feb 2020

I had a try with the latest 2.x version: as far as I can tell the issue is still there.

jleloup on 28 Feb 2020

Thanks for opening this issue. I can repro it now. Seems the root cause is that node pool api doesn't support AvailabilitySet cluster. If it’s possible, suggests to shift node pool type to VMSS via recreation as workaround and then it should be able to update now.

neil-yechenwei on 7 May 2020

We ran into the same issue. Is this the case that the azure-provider is scaling a node pool with type=AvailabilitySet with Azure's AgentPool API? As AgentPool API only supports VMSS agent pools?

kevinjqiu on 15 May 2020

Is there any intention of fixing this? This feels like a pretty significant issue to me, as AvailabilitySets inherently don't work with the 2.0+ provider.

SirensOfTitan on 29 Jun 2020

For creating aks cluster, terraform is using KubernetesClusters API to create. For updating aks cluster, terraform is using AgentPools API. According by the error message, I assume AgentPools API doesn't allow to migrate agent pool profile for AvailabilitySet from KubernetesClusters API. Seems it's an API limitation. So I assume terraform cannot update agent pool profile for AvailabilitySet for now.

neil-yechenwei on 12 Aug 2020

Any update on this ?