Terraform-provider-azurerm: Updates to azurerm_kubernetes_cluster fail when cluster uses managed AAD integration

Created on 15 Jun 2020 · 35Comments · Source: terraform-providers/terraform-provider-azurerm

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.26

provider.azurerm v2.14.0

Affected Resource(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_resource_group" "aks" {
  name     = "aks-service-rg"
  location = "northeurope"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                            = "aks-service"
  location                        = azurerm_resource_group.aks.location
  resource_group_name             = azurerm_resource_group.aks.name
  node_resource_group             = "aks-infra-rg"
  dns_prefix                      = "aks-dev"
  enable_pod_security_policy      = false
  private_cluster_enabled         = false
  api_server_authorized_ip_ranges = null

  default_node_pool {
    name            = "default"
    node_count      = 4
    vm_size         = "Standard_B2ms"
    os_disk_size_gb = 30
    vnet_subnet_id  = var.virtual_network.subnets.aks.id
    max_pods        = 60
    type            = "VirtualMachineScaleSets"
  }

  linux_profile {
    admin_username = var.admin_username

    ssh_key {
      key_data = tls_private_key.aks.public_key_openssh
    }
  }

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed                 = true
      admin_group_object_ids  = [for key, value in local.cluster_admins : value.object_id] 
    }
  }

  identity {
    type    = "SystemAssigned"
  }

  addon_profile {

    azure_policy {
      enabled = true
    }

    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = var.log_analytics_workspace.id
    }

    kube_dashboard {
      enabled = true
    }

    http_application_routing {
      enabled = false
    }

  }

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    load_balancer_sku  = "Basic"
    service_cidr       = var.kubernetes_service_cidr
    docker_bridge_cidr = var.docker_bridge_cidr
    dns_service_ip     = cidrhost(var.kubernetes_service_cidr, 2)
  }

  tags = local.tags

}

Debug Output

Panic Output

Expected Behavior

Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
Apply plan to create cluster with managed Azure Active Directory integration
Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
Run terraform plan
Apply plan
Tags are updated to reflect changes

Actual Behavior

Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
Apply plan to create cluster with managed Azure Active Directory integration
Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
Run terraform plan
Apply plan
Apply fails with error: -

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-service" (Resource Group "aks-service-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

Steps to Reproduce

Register feature 'Microsoft.ContainerService/AAD-V2' on subscription as per https://docs.microsoft.com/en-us/azure/aks/managed-aad
terraform plan
terraform apply
Make changes to resource
terraform plan
terraform apply

Important Factoids

References

#0000

bug servickubernetes-cluster

Source

pindey

👍35

Most helpful comment

I've implemented a fix and added Acceptance tests to cover the scenarios in this issue.

If nothing goes wrong it will make next release! 🎉

aristosvo on 27 Jul 2020

👍4

All 35 comments

@tombuildsstuff or anyone can we mayber get this into the next release as a fix? currently it blocks from using the feature as updates to the cluster makes this break.

ekarlso on 23 Jun 2020

👍3

Week late on this buuutt... me and a colleague had same error yesterday. We noticed you could update the rbac details via cli so for anyone that wants a workaround while this is being looked at: we deleted the aks cluster, set the role_based_access_control block to

role_based_access_control {
    enabled = true
    azure_active_directory {
      managed = true
    }
}

then created a null resource where we update the managed admin ids

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
  provisioner "local-exec" {
    command = <<EOT
      # --update ids
      az aks update -g <resource_group> -n <name> --aad-tenant-id <tenant_id> --aad-admin-group-object-ids <admin_group_ids>
   EOT
  }
}

However, you'll also need a ignore_change on the aks rbac block

lifecycle {
    ignore_changes = [
      role_based_access_control
    ]
  }

az version: 2.8
azurerm_provider version: 2.15

EDIT: if tags change, it still raises the resetAADProfile error. You can add this to the ignore if that works for you, but obviously you can't update tags (big disadvantage). Unfortunately, there is no az aks update tags options either. Investigating using az resource tag

jhawthorn22 on 24 Jun 2020

👍1

I was hoping that workaround applied also to already deployed clusters but, for the records... it doesn't... :(

EPinci on 24 Jun 2020

Yeh we've just tested with some other changes and the tag changes still raise the resetAADprofile error :(. Will feedback if we find a workaround.

jhawthorn22 on 24 Jun 2020

I've just encountered the same issue now. Going to try @jhawthorn22's approach

r3mattia on 24 Jun 2020

Btw @pindey are you sure it is not because you use:

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true 
      admin_group_object_ids  = [for key, value in local.cluster_admins : value.object_id] 
    }
  }

Rather than:

  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true 
      admin_group_object_ids  = [for key, value in local.cluster_admins : value] 
    }
  }

r3mattia on 24 Jun 2020

@r3mattia local.cluster_admins is a map variable, value.object_id is correct. The aadProfile block output by 'az aks show' returns the expected list of adminGroupObjectIds. Happy to give it a whirl with a static list.

pindey on 25 Jun 2020

Really don't like this approach but it's working for us. Created 2 provisioners, one for the AAD admin group ids, one for updating the tags.

Admin groups provisioner:

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    admin_group_changed = var.default_config.aks_admin_group_id
  }

  provisioner "local-exec" {
    command ="${path.module}/scripts/update_admin_group_ids.sh -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -n ${azurerm_kubernetes_cluster.aks.name} -t ${data.azurerm_client_config.current.tenant_id} -a ${var.default_config.aks_admin_group_id}"
    interpreter = ["bash", "-c"]
  }
}

AKS tags update provisioner:

resource "null_resource" "update_aks_tags" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    always_run = timestamp()
  }

  provisioner "local-exec" {
    command = "${path.module}/scripts/update_tags.sh -n ${azurerm_kubernetes_cluster.aks.name} -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -t ${jsonencode(var.tags)}"
    interpreter = ["bash", "-c"]
  }
}

Provisioner scripts:

#!/bin/bash

# options
while getopts g:n:t:a: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tenant_id=${OPTARG};;
        a) admin_group_id=${OPTARG};;
    esac
done

# --get extension
echo "--get aks-preview extension"
az extension add --name aks-preview
az extension list

# --register feature
echo "--register feature"
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/AAD-V2')].{Name:name,State:properties.state}"
az provider register --namespace Microsoft.ContainerService

# --update admin id
az aks update --resource-group "$resource_group" --name "$aks_name" --aad-tenant-id "$tenant_id" --aad-admin-group-object-ids "$admin_group_id"

#!/bin/bash

## options
while getopts g:n:t: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tags=${OPTARG};;
    esac
done

## reformat tags
# shellcheck disable=SC2207
tags_arr=($(echo "$tags" | jq . | jq -r 'to_entries[] | "\(.key)=\(.value)"' | tr '\n' ' '))

## update tags
for tag in "${tags_arr[@]}";
do
    echo "tag: $tag"
    az resource tag --resource-group "$resource_group" --name "$aks_name" --resource-type "Microsoft.ContainerService/ManagedClusters" -i --tags "$tag"
done

jhawthorn22 on 25 Jun 2020

@pindey, I used this approach:

locals {
  cluster_admins = yamldecode(data.local_file.cluster_admins.content)
}

data "local_file" "cluster_admins" {
  filename = "${path.module}/data/cluster-admins.yml"
}

Then the yml file looks like this:

---
cluster-admins: adminGroupObjectId

Worked for me.

I have not tried this though:

---
cluster-admins1: adminGroupObjectId
cluster-admins2: adminGroupObjectId

but I do not see why it shouldn't work.

r3mattia on 25 Jun 2020

I also faced with this issue. I've got this error message when I've tried to update AAD settings manually through API , however I've managed to update settings with azure cli.

id27182 on 30 Jun 2020

This error also occurs when modifying other properties of the cluster such as the max node count on a node pool

      ~ default_node_pool {
            availability_zones    = []
            enable_auto_scaling   = true
            enable_node_public_ip = false
          ~ max_count             = 3 -> 4
            max_pods              = 30
            min_count             = 3
            name                  = "default"
            node_count            = 3
            node_labels           = {}
            node_taints           = []
            orchestrator_version  = "1.17.7"
            os_disk_size_gb       = 30
            tags                  = {}
            type                  = "VirtualMachineScaleSets"
            vm_size               = "Standard_DS3_v2"
            vnet_subnet_id        = "/subscriptions/xxxxxxxxxxxxxxxxx/resourceGroups/rg-pegaplatform-network-sbox-canadacentral-persistent/providers/Microsoft.Network/virtualNetworks/vnet-pegaplatform-network-sbox-canadacentral/subnets/Private"
        }

....


Plan: 0 to add, 1 to change, 0 to destroy.

error:

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-pegaplatform-sbox-canadacentral" (Resource Group "rg-pegaplatform-sbox-canadacentral-persistent"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

  on main.tf line 141, in resource "azurerm_kubernetes_cluster" "aks_cluster":
 141: resource "azurerm_kubernetes_cluster" "aks_cluster" {

patpicos on 7 Jul 2020

👍2

Looking at the TF code, I am not sure why the role_based_access_control is seen as a change when I update the # of max nodes on a node pool. Looking at the resource schema, I wonder if this setting has a bearing on forcing an update:
https://github.com/terraform-providers/terraform-provider-azurerm/blob/277f16cdb3cd9916fe1b1b56e58f9dc4bb55af37/azurerm/internal/services/containers/kubernetes_cluster_resource.go#L431

https://github.com/terraform-providers/terraform-provider-azurerm/blob/277f16cdb3cd9916fe1b1b56e58f9dc4bb55af37/azurerm/internal/services/containers/kubernetes_cluster_resource.go#L874,L895

patpicos on 7 Jul 2020

The error message also appears when trying to update Kubernetes version. There are too many issues to even consider a reliable work around, so feature is unusable. Just had to revert to using service principle unfortunately.

julianwyngaard on 9 Jul 2020

👍2

Updating kubernetes version also caused this issue:
'resetAADProfile' is not allowed for managed AAD enabled cluster.

hieumoscow on 16 Jul 2020

I can reproduce the same error while updating the autoscaler configuration (e.g. update max_count 3 -> 4).
Executing the same configuration update via the Azure CLI works without issues.

Versions:
Terraform 0.12.28
terraform-provider-azurerm 2.18.0

ch-bohlen on 22 Jul 2020

Short of it, AAD v2 is a preview feature and it was enabled in the provider. the resetAADProfile is not supported with AAD v2 clusters (from MS side). Therefore the call to reset it should be omitted if managed = true until microsoft starts supporting the call.

patpicos on 22 Jul 2020

resetAADProfile with API version 2020-06-01 seems to support enableAzureRBAC:
https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/resetaadprofile#request-body

So I guess this could be fixed by using the new API version.

PSanetra on 22 Jul 2020

👍1

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

sam-cogan on 22 Jul 2020

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

@sam-cogan that is very interesting news. This commit for the documentation confirms what you are saying. https://github.com/MicrosoftDocs/azure-docs/commit/96ab8c1c3669600ac8cbf91ad3bd3a80a82e445a#diff-90a9850acdb4834ff96cc6562e19144e

I didnt see a notice in AKS release notes. Perhaps one is imminent. @PSanetra might be on the right path w/ updating the API version and see if it makes these errors go away

patpicos on 22 Jul 2020

Got confirmation from PM on being GA
https://azure.microsoft.com/en-us/updates/aksmanaged-azure-active-directory-support-is-now-generally-available/?s=09

patpicos on 22 Jul 2020

We are creating a new cluster today with AAD v2 support, will let you know how it goes! Will look into it if it is not working

aristosvo on 23 Jul 2020

I created a new cluster yesterday and can confirm the issue is present. You do not see it at cluster creation (at least I didn't) but when you try and modify the cluster to change the amount of nodes, update version etc. you will see the issue.

sam-cogan on 23 Jul 2020

The feature is not GA anymore, due to a delayed rollout: https://github.com/Azure/AKS/issues/1489#issuecomment-663065702.

Also, when I deploy it with a custom build azurerm provider API version 2020-06-01, it still doesn't work and still complains if the preview feature is not enabled:

az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az provider register -n Microsoft.ContainerService

I'm working on a PR at the moment, seems to work but Acceptance testing takes a little while.

aristosvo on 23 Jul 2020

I've implemented a fix and added Acceptance tests to cover the scenarios in this issue.

If nothing goes wrong it will make next release! 🎉

aristosvo on 27 Jul 2020

👍4

This has been released in version 2.21.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.21.0"
}
# ... other configuration ...

hashibot[bot] on 31 Jul 2020

❤2

Upgrading provider to 2.21.0 version works :)

c4m4 on 6 Aug 2020

🚀1

With the managed AAD how do we attach an ACR instance to the AKS cluster? Before with a manually setup service provider you would just propogate the ACR role to that principal, but as far as I can see there is no way to get access to the underlying service principal that gets setup automatically.

tkinz27 on 6 Aug 2020

@tkinz27 your talking about two different things here. The managed AAD integration this issue refers to is related to being able to login to the cluster for admin work as an AAD user, has nothing to do with the clusters access to other resources.

Using managed identity for the cluster identity creates a user assigned managed identity which you can retrieve the name of using the "user_assigned_identity_id" of the "kubelet_identity" block. you would then grant this managed identity access to ACR.

sam-cogan on 6 Aug 2020

👍1

Ohhh... Sorry new to Azure (coming from AWS) and all the auth has definitely been confusing. Thank you for so quickly clearing that up for me.

tkinz27 on 6 Aug 2020

EDIT: This is working fine now, it was my loosing configuration. Thanks aristosvo!

So I added like instructed to main main.tf

  managed                 = true
  // optional:
  admin_group_object_ids  = ["myAksAdminId_NOT_group_name"]
  # these have to comment out  
  #client_app_id     = var.aad_client_app_id
  #server_app_id     = var.aad_server_app_id
  #server_app_secret = var.aad_server_app_secret
  tenant_id         = var.aad_tenant_id

WORKED!

I get still error about ResetAADProfile althoug I used v2.21.0 azurerm provider.

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "sutinenseaks-aks" (Resource Group "sutinenseaks-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

on main.tf line 45, in resource "azurerm_kubernetes_cluster" "demo":
45: resource "azurerm_kubernetes_cluster" "demo" {

I up
terraform.zip
graded azurerm provider to 2.21.0
terraform init -upgrade

Upgraded also kubernetes provider 1.11.1 -> 1.12.0, not still working

terraform version
Terraform v0.13.0

provider registry.terraform.io/hashicorp/azurerm v2.21.0
provider registry.terraform.io/hashicorp/github v2.4.1
provider registry.terraform.io/hashicorp/kubernetes v1.12.0
provider registry.terraform.io/hashicorp/tls v2.1.0

My try was done according that tutorial
https://github.com/Azure/sg-aks-workshop

sutinse1 on 13 Aug 2020

👍1

@sutinse1 Can you provide the configuration you are using?

aristosvo on 13 Aug 2020

What I see is a cluster setup with AAD-v1 integration. Apparently either the backward compatibility here is a problem or you're mixing things up in your setup. I think the first is the issue, I'll run a few tests when I've time at hand.

I'd recommend for now to restructure/simplify your terraform file for the AAD integration:

resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true
      // optional:
      // admin_group_object_ids  = [<AAD group object ids which you want to make cluster admin via AAD login>] 
    }
  }
...
}

aristosvo on 14 Aug 2020

@sutinse1 Can you explain in short what you did before you ended up with the mentioned error?

What I think you did was as follows:

Create the azurerm_kubernetes_cluster with the setup from the course:

resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = var.aad_client_app_id
      server_app_id     = var.aad_server_app_id
      server_app_secret = var.aad_server_app_secret
      tenant_id         = var.aad_tenant_id
    }
  }
...
}

You probably upgraded it to AAD-v2 via commandline az aks update -g myResourceGroup -n myManagedCluster --enable-aad or similar.
You reapplied the old configuration with Terraform.

If not, I'm very curious how your configuration ended up in the state with the error 😄

aristosvo on 16 Aug 2020

👍1

@aristosvo I did just like you wrote, I upgraded to AAD-v2 with registering feature.

// I registered AAD-V2 feature
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
// created AD group for AKS
az ad group create --display-name myAKSAdmin --mail-nickname myAKSAdmin
// added myself to group
az ad group member add --group myAKSAdmin --member-id $id

// Updated cluster
groupid=$(az ad group show --groupmyAKSAdmin --query objectId --output tsv)
tenantid=$(az account show --query tenantId --output tsv)
az aks update -g myaks-rg -n myaks-aks --aad-tenant-id $tenantid --aad-admin-group-object-ids $groupid

I somehow think that terraform can query if AAD is used :) My mistake.

My configuration now (I have to uncomment SP's)

role_based_access_control {
enabled = true
azure_active_directory {
managed = true
// optional:
admin_group_object_ids = ["myAKSAdmin_groupID_not_text"]
#client_app_id = var.aad_client_app_id
#server_app_id = var.aad_server_app_id
#server_app_secret = var.aad_server_app_secret
tenant_id = var.aad_tenant_id
}
}

sutinse1 on 17 Aug 2020

👍1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot[bot] on 28 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings