Terraform-provider-azurerm: Updates to azurerm_kubernetes_cluster fail when cluster uses managed AAD integration

Created on 15 Jun 2020  ยท  35Comments  ยท  Source: terraform-providers/terraform-provider-azurerm

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.26

  • provider.azurerm v2.14.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_resource_group" "aks" {
  name     = "aks-service-rg"
  location = "northeurope"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                            = "aks-service"
  location                        = azurerm_resource_group.aks.location
  resource_group_name             = azurerm_resource_group.aks.name
  node_resource_group             = "aks-infra-rg"
  dns_prefix                      = "aks-dev"
  enable_pod_security_policy      = false
  private_cluster_enabled         = false
  api_server_authorized_ip_ranges = null

  default_node_pool {
    name            = "default"
    node_count      = 4
    vm_size         = "Standard_B2ms"
    os_disk_size_gb = 30
    vnet_subnet_id  = var.virtual_network.subnets.aks.id
    max_pods        = 60
    type            = "VirtualMachineScaleSets"
  }

  linux_profile {
    admin_username = var.admin_username

    ssh_key {
      key_data = tls_private_key.aks.public_key_openssh
    }
  }

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed                 = true
      admin_group_object_ids  = [for key, value in local.cluster_admins : value.object_id] 
    }
  }

  identity {
    type    = "SystemAssigned"
  }

  addon_profile {

    azure_policy {
      enabled = true
    }

    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = var.log_analytics_workspace.id
    }

    kube_dashboard {
      enabled = true
    }

    http_application_routing {
      enabled = false
    }

  }

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    load_balancer_sku  = "Basic"
    service_cidr       = var.kubernetes_service_cidr
    docker_bridge_cidr = var.docker_bridge_cidr
    dns_service_ip     = cidrhost(var.kubernetes_service_cidr, 2)
  }

  tags = local.tags

}

Debug Output

Panic Output

Expected Behavior

  • Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
  • Apply plan to create cluster with managed Azure Active Directory integration
  • Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
  • Run terraform plan
  • Apply plan
  • Tags are updated to reflect changes

Actual Behavior

  • Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
  • Apply plan to create cluster with managed Azure Active Directory integration
  • Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
  • Run terraform plan
  • Apply plan
  • Apply fails with error: -

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-service" (Resource Group "aks-service-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

Steps to Reproduce

  1. Register feature 'Microsoft.ContainerService/AAD-V2' on subscription as per https://docs.microsoft.com/en-us/azure/aks/managed-aad
  2. terraform plan
  3. terraform apply
  4. Make changes to resource
  5. terraform plan
  6. terraform apply

Important Factoids

References

  • #0000
bug servickubernetes-cluster

Most helpful comment

I've implemented a fix and added Acceptance tests to cover the scenarios in this issue.

If nothing goes wrong it will make next release! ๐ŸŽ‰

All 35 comments

@tombuildsstuff or anyone can we mayber get this into the next release as a fix? currently it blocks from using the feature as updates to the cluster makes this break.

Week late on this buuutt... me and a colleague had same error yesterday. We noticed you could update the rbac details via cli so for anyone that wants a workaround while this is being looked at: we deleted the aks cluster, set the role_based_access_control block to

role_based_access_control {
    enabled = true
    azure_active_directory {
      managed = true
    }
}

then created a null resource where we update the managed admin ids

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
  provisioner "local-exec" {
    command = <<EOT
      # --update ids
      az aks update -g <resource_group> -n <name> --aad-tenant-id <tenant_id> --aad-admin-group-object-ids <admin_group_ids>
   EOT
  }
}

However, you'll also need a ignore_change on the aks rbac block

lifecycle {
    ignore_changes = [
      role_based_access_control
    ]
  }

az version: 2.8
azurerm_provider version: 2.15

EDIT: if tags change, it still raises the resetAADProfile error. You can add this to the ignore if that works for you, but obviously you can't update tags (big disadvantage). Unfortunately, there is no az aks update tags options either. Investigating using az resource tag

I was hoping that workaround applied also to already deployed clusters but, for the records... it doesn't... :(

Yeh we've just tested with some other changes and the tag changes still raise the resetAADprofile error :(. Will feedback if we find a workaround.

I've just encountered the same issue now. Going to try @jhawthorn22's approach

Btw @pindey are you sure it is not because you use:

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true 
      admin_group_object_ids  = [for key, value in local.cluster_admins : value.object_id] 
    }
  }

Rather than:

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true 
      admin_group_object_ids  = [for key, value in local.cluster_admins : value] 
    }
  }

@r3mattia local.cluster_admins is a map variable, value.object_id is correct. The aadProfile block output by 'az aks show' returns the expected list of adminGroupObjectIds. Happy to give it a whirl with a static list.

Really don't like this approach but it's working for us. Created 2 provisioners, one for the AAD admin group ids, one for updating the tags.

Admin groups provisioner:

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    admin_group_changed = var.default_config.aks_admin_group_id
  }

  provisioner "local-exec" {
    command ="${path.module}/scripts/update_admin_group_ids.sh -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -n ${azurerm_kubernetes_cluster.aks.name} -t ${data.azurerm_client_config.current.tenant_id} -a ${var.default_config.aks_admin_group_id}"
    interpreter = ["bash", "-c"]
  }
}

AKS tags update provisioner:

resource "null_resource" "update_aks_tags" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    always_run = timestamp()
  }

  provisioner "local-exec" {
    command = "${path.module}/scripts/update_tags.sh -n ${azurerm_kubernetes_cluster.aks.name} -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -t ${jsonencode(var.tags)}"
    interpreter = ["bash", "-c"]
  }
}

Provisioner scripts:

#!/bin/bash

# options
while getopts g:n:t:a: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tenant_id=${OPTARG};;
        a) admin_group_id=${OPTARG};;
    esac
done

# --get extension
echo "--get aks-preview extension"
az extension add --name aks-preview
az extension list

# --register feature
echo "--register feature"
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/AAD-V2')].{Name:name,State:properties.state}"
az provider register --namespace Microsoft.ContainerService

# --update admin id
az aks update --resource-group "$resource_group" --name "$aks_name" --aad-tenant-id "$tenant_id" --aad-admin-group-object-ids "$admin_group_id"

#!/bin/bash

## options
while getopts g:n:t: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tags=${OPTARG};;
    esac
done

## reformat tags
# shellcheck disable=SC2207
tags_arr=($(echo "$tags" | jq . | jq -r 'to_entries[] | "\(.key)=\(.value)"' | tr '\n' ' '))

## update tags
for tag in "${tags_arr[@]}";
do
    echo "tag: $tag"
    az resource tag --resource-group "$resource_group" --name "$aks_name" --resource-type "Microsoft.ContainerService/ManagedClusters" -i --tags "$tag"
done

@pindey, I used this approach:

locals {
  cluster_admins = yamldecode(data.local_file.cluster_admins.content)
}

data "local_file" "cluster_admins" {
  filename = "${path.module}/data/cluster-admins.yml"
}

Then the yml file looks like this:

---
cluster-admins: adminGroupObjectId

Worked for me.

I have not tried this though:

---
cluster-admins1: adminGroupObjectId
cluster-admins2: adminGroupObjectId

but I do not see why it shouldn't work.

I also faced with this issue. I've got this error message when I've tried to update AAD settings manually through API , however I've managed to update settings with azure cli.

This error also occurs when modifying other properties of the cluster such as the max node count on a node pool

      ~ default_node_pool {
            availability_zones    = []
            enable_auto_scaling   = true
            enable_node_public_ip = false
          ~ max_count             = 3 -> 4
            max_pods              = 30
            min_count             = 3
            name                  = "default"
            node_count            = 3
            node_labels           = {}
            node_taints           = []
            orchestrator_version  = "1.17.7"
            os_disk_size_gb       = 30
            tags                  = {}
            type                  = "VirtualMachineScaleSets"
            vm_size               = "Standard_DS3_v2"
            vnet_subnet_id        = "/subscriptions/xxxxxxxxxxxxxxxxx/resourceGroups/rg-pegaplatform-network-sbox-canadacentral-persistent/providers/Microsoft.Network/virtualNetworks/vnet-pegaplatform-network-sbox-canadacentral/subnets/Private"
        }

....


Plan: 0 to add, 1 to change, 0 to destroy.

error:

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-pegaplatform-sbox-canadacentral" (Resource Group "rg-pegaplatform-sbox-canadacentral-persistent"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

  on main.tf line 141, in resource "azurerm_kubernetes_cluster" "aks_cluster":
 141: resource "azurerm_kubernetes_cluster" "aks_cluster" {

The error message also appears when trying to update Kubernetes version. There are too many issues to even consider a reliable work around, so feature is unusable. Just had to revert to using service principle unfortunately.

Updating kubernetes version also caused this issue:
'resetAADProfile' is not allowed for managed AAD enabled cluster.

I can reproduce the same error while updating the autoscaler configuration (e.g. update max_count 3 -> 4).
Executing the same configuration update via the Azure CLI works without issues.

Versions:
Terraform 0.12.28
terraform-provider-azurerm 2.18.0

Short of it, AAD v2 is a preview feature and it was enabled in the provider. the resetAADProfile is not supported with AAD v2 clusters (from MS side). Therefore the call to reset it should be omitted if managed = true until microsoft starts supporting the call.

resetAADProfile with API version 2020-06-01 seems to support enableAzureRBAC:
https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/resetaadprofile#request-body

So I guess this could be fixed by using the new API version.

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

@sam-cogan that is very interesting news. This commit for the documentation confirms what you are saying. https://github.com/MicrosoftDocs/azure-docs/commit/96ab8c1c3669600ac8cbf91ad3bd3a80a82e445a#diff-90a9850acdb4834ff96cc6562e19144e

I didnt see a notice in AKS release notes. Perhaps one is imminent. @PSanetra might be on the right path w/ updating the API version and see if it makes these errors go away

We are creating a new cluster today with AAD v2 support, will let you know how it goes! Will look into it if it is not working

I created a new cluster yesterday and can confirm the issue is present. You do not see it at cluster creation (at least I didn't) but when you try and modify the cluster to change the amount of nodes, update version etc. you will see the issue.

The feature is not GA anymore, due to a delayed rollout: https://github.com/Azure/AKS/issues/1489#issuecomment-663065702.

Also, when I deploy it with a custom build azurerm provider API version 2020-06-01, it still doesn't work and still complains if the preview feature is not enabled:

az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az provider register -n Microsoft.ContainerService

I'm working on a PR at the moment, seems to work but Acceptance testing takes a little while.

I've implemented a fix and added Acceptance tests to cover the scenarios in this issue.

If nothing goes wrong it will make next release! ๐ŸŽ‰

This has been released in version 2.21.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.21.0"
}
# ... other configuration ...

Upgrading provider to 2.21.0 version works :)

With the managed AAD how do we attach an ACR instance to the AKS cluster? Before with a manually setup service provider you would just propogate the ACR role to that principal, but as far as I can see there is no way to get access to the underlying service principal that gets setup automatically.

@tkinz27 your talking about two different things here. The managed AAD integration this issue refers to is related to being able to login to the cluster for admin work as an AAD user, has nothing to do with the clusters access to other resources.

Using managed identity for the cluster identity creates a user assigned managed identity which you can retrieve the name of using the "user_assigned_identity_id" of the "kubelet_identity" block. you would then grant this managed identity access to ACR.

Ohhh... Sorry new to Azure (coming from AWS) and all the auth has definitely been confusing. Thank you for so quickly clearing that up for me.

EDIT: This is working fine now, it was my loosing configuration. Thanks aristosvo!

So I added like instructed to main main.tf

  managed                 = true
  // optional:
  admin_group_object_ids  = ["myAksAdminId_NOT_group_name"]
  # these have to comment out  
  #client_app_id     = var.aad_client_app_id
  #server_app_id     = var.aad_server_app_id
  #server_app_secret = var.aad_server_app_secret
  tenant_id         = var.aad_tenant_id

WORKED!

I get still error about ResetAADProfile althoug I used v2.21.0 azurerm provider.

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "sutinenseaks-aks" (Resource Group "sutinenseaks-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

on main.tf line 45, in resource "azurerm_kubernetes_cluster" "demo":
45: resource "azurerm_kubernetes_cluster" "demo" {

I up
terraform.zip
graded azurerm provider to 2.21.0
terraform init -upgrade

Upgraded also kubernetes provider 1.11.1 -> 1.12.0, not still working

terraform version
Terraform v0.13.0

  • provider registry.terraform.io/hashicorp/azurerm v2.21.0
  • provider registry.terraform.io/hashicorp/github v2.4.1
  • provider registry.terraform.io/hashicorp/kubernetes v1.12.0
  • provider registry.terraform.io/hashicorp/tls v2.1.0

My try was done according that tutorial
https://github.com/Azure/sg-aks-workshop

@sutinse1 Can you provide the configuration you are using?

What I see is a cluster setup with AAD-v1 integration. Apparently either the backward compatibility here is a problem or you're mixing things up in your setup. I think the first is the issue, I'll run a few tests when I've time at hand.

I'd recommend for now to restructure/simplify your terraform file for the AAD integration:

resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true
      // optional:
      // admin_group_object_ids  = [<AAD group object ids which you want to make cluster admin via AAD login>] 
    }
  }
...
}

@sutinse1 Can you explain in short what you did before you ended up with the mentioned error?

What I think you did was as follows:

  • Create the azurerm_kubernetes_cluster with the setup from the course:
resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = var.aad_client_app_id
      server_app_id     = var.aad_server_app_id
      server_app_secret = var.aad_server_app_secret
      tenant_id         = var.aad_tenant_id
    }
  }
...
}
  • You probably upgraded it to AAD-v2 via commandline az aks update -g myResourceGroup -n myManagedCluster --enable-aad or similar.
  • You reapplied the old configuration with Terraform.

If not, I'm very curious how your configuration ended up in the state with the error ๐Ÿ˜„

@aristosvo I did just like you wrote, I upgraded to AAD-v2 with registering feature.

// I registered AAD-V2 feature
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
// created AD group for AKS
az ad group create --display-name myAKSAdmin --mail-nickname myAKSAdmin
// added myself to group
az ad group member add --group myAKSAdmin --member-id $id

// Updated cluster
groupid=$(az ad group show --groupmyAKSAdmin --query objectId --output tsv)
tenantid=$(az account show --query tenantId --output tsv)
az aks update -g myaks-rg -n myaks-aks --aad-tenant-id $tenantid --aad-admin-group-object-ids $groupid

I somehow think that terraform can query if AAD is used :) My mistake.

My configuration now (I have to uncomment SP's)

role_based_access_control {
enabled = true
azure_active_directory {
managed = true
// optional:
admin_group_object_ids = ["myAKSAdmin_groupID_not_text"]
#client_app_id = var.aad_client_app_id
#server_app_id = var.aad_server_app_id
#server_app_secret = var.aad_server_app_secret
tenant_id = var.aad_tenant_id
}
}

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐Ÿค– ๐Ÿ™‰ , please reach out to my human friends ๐Ÿ‘‰ [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings