Terraform-provider-azurerm: Creating an AKS cluster using a user managed identity fails

Created on 23 Jul 2019  ·  6Comments  ·  Source: terraform-providers/terraform-provider-azurerm

What happened:

When provisioning an AKS cluster from a virtual machine with terraform installed where the currently logged in user type is a user managed identity and the appropriate terraform environment variables are set for ARM MSI, an error occurs:

Error: Error waiting for completion of Managed Kubernetes Cluster "cpe-demo-cjw-k8s" (Resource Group "cpe-demo-cjw-k8s-rg"): 
Code="CreateRoleAssignmentError" Message="RoleAssignmentReconciler retry timed out: autorest/azure: Service returned an error. 
Status=403 Code=\"AuthorizationFailed\" Message=\"The client '565e8efe-af95-4b0c-8641-6e2f0fc4aac8' with object id '565e8efe-af95-4b0c-8641-6e2f0fc4aac8' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/9e6b1432-c830-41fb-9c63-b1de69af46dd/resourceGroups/MC_cpe-demo-cjw-k8s-rg_cpe-demo-cjw-k8s_westeurope/providers/Microsoft.Authorization/roleAssignments/194a3266-434d-420c-abc2-0b33ba02640b' or the scope is invalid. If access was recently granted, please refresh your credentials.\""

What you expected to happen:

The cluster to be created successfully

How to reproduce it (as minimally and precisely as possible):

  1. Assign a user managed identity on a virtual machine where the user managed identity has Owner rights to the subscription.
  2. Attempt to create a Kubernetes cluster
resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks"
  location            = "westeurope"
  resource_group_name = "aks-resource-group"
  dns_prefix          = "aks"
  kubernetes_version  =  "1.13.5
  agent_pool_profile {
    name            = "nodepool"
    count           = "3"
    vm_size         = "Standard_DS2_v2"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }

  service_principal {
    client_id     = REDACTED
    client_secret = REDACTED
  }
}

Anything else we need to know?:

We've tested this with a service principal with the exact same permissions, which actually works fine but when using a managed identity, it seems to produce the aforementioned error each time.

The referenced client id in the error, '565e8efe-af95-4b0c-8641-6e2f0fc4aac8', does not actually exist in our AD tenant.

See:

~ $ az ad sp show --id 565e8efe-af95-4b0c-8641-6e2f0fc4aac8
{
  "accountEnabled": "True",
  "addIns": [],
  "alternativeNames": [],
  "appDisplayName": "AzureContainerService",
  "appId": "7319c514-987d-4e9b-ac3d-d38c4f427f4c",
  "appOwnerTenantId": "f8cdef31-a31e-4b4a-93e4-5f571e91255a",
  "appRoleAssignmentRequired": false,
  "appRoles": [],
  "applicationTemplateId": null,
  "deletionTimestamp": null,
  "displayName": "AzureContainerService",
  "errorUrl": null,
  "homepage": null,
  "informationalUrls": {
    "marketing": null,
    "privacy": null,
    "support": null,
    "termsOfService": null
  },
  "keyCredentials": [],
  "logoutUrl": null,
  "notificationEmailAddresses": [],
  "oauth2Permissions": [],
  "objectId": "565e8efe-af95-4b0c-8641-6e2f0fc4aac8",
  "objectType": "ServicePrincipal",
  "odata.metadata": "https://graph.windows.net/f55b1f7d-7a7f-49e4-9b90-55218aad89f8/$metadata#directoryObjects/@Element",
  "odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
  "passwordCredentials": [],
  "preferredSingleSignOnMode": null,
  "preferredTokenSigningKeyEndDateTime": null,
  "preferredTokenSigningKeyThumbprint": null,
  "publisherName": "Microsoft Services",
  "replyUrls": [],
  "samlMetadataUrl": null,
  "samlSingleSignOnSettings": null,
  "servicePrincipalNames": [
    "7319c514-987d-4e9b-ac3d-d38c4f427f4c"
  ],
  "servicePrincipalType": "Application",
  "signInAudience": "AzureADMultipleOrgs",
  "tags": [],
  "tokenEncryptionKeyId": null
}

When creating the AKS cluster using the CLI from a machine that has the managed identity assigned:

~$ az login --identity
[
  {
    "environmentName": "AzureCloud",
    "id": "<REDACTED>",
    "isDefault": true,
    "name": "<REDACTED>",
    "state": "Enabled",
    "tenantId": "<REDACTED>",
    "user": {
      "assignedIdentityInfo": "MSI",
      "name": "systemAssignedIdentity",
      "type": "servicePrincipal"
    }
  }
]
~$ az aks create \
>     --resource-group aks-resource-group \
>     --name myAKSCluster \
>     --node-count 1 \
>     --service-principal <REDACTED> \
>     --client-secret <REDACTED> \
>     --generate-ssh-keys
SSH key files '/home/jenkins/.ssh/id_rsa' and '/home/jenkins/.ssh/id_rsa.pub' have been generated under ~/.ssh to allow SSH access to the VM. If using machines without permanent storage like Azure Cloud Shell without an attached file share, back up your keys to a safe location
 - Running ..
{
  "aadProfile": null,
  "addonProfiles": null,
  "agentPoolProfiles": [
    {
      "availabilityZones": null,
      "count": 1,
      "enableAutoScaling": null,
      "maxCount": null,
      "maxPods": 110,

and so on, indicating a successful creation..
bug servickubernetes-cluster

Most helpful comment

Hi @cwebbtw,
since yesterday we have exactly same issue with aks cluster creation. But after 3-4 attempts it was successful to create a new aks cluster with managed identity. We create a support case today and awaiting feedback from azure.

All 6 comments

Hi @cwebbtw,
since yesterday we have exactly same issue with aks cluster creation. But after 3-4 attempts it was successful to create a new aks cluster with managed identity. We create a support case today and awaiting feedback from azure.

I'm not sure this is using a managed identity anymore; this failed earlier with the same message when using a standard user with owner permissions.

Hi,
as per Azure support this issue is fixed.

There was small disturbance regarding cluster creation and now its solved

hey @cwebbtw

Thanks for opening this issue :)

Given Azure mentioned that this was an issue on their side - are you still seeing this issue?

Thanks!

@tombuildsstuff According to https://github.com/Azure/AKS/issues/1123, the Azure team have closed the issue as I could not reproduce this subsequent to them notifying us of a problem in westeurope.

I'll close this issue.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings