What happened:
When provisioning an AKS cluster from a virtual machine with terraform installed where the currently logged in user type is a user managed identity and the appropriate terraform environment variables are set for ARM MSI, an error occurs:
Error: Error waiting for completion of Managed Kubernetes Cluster "cpe-demo-cjw-k8s" (Resource Group "cpe-demo-cjw-k8s-rg"):
Code="CreateRoleAssignmentError" Message="RoleAssignmentReconciler retry timed out: autorest/azure: Service returned an error.
Status=403 Code=\"AuthorizationFailed\" Message=\"The client '565e8efe-af95-4b0c-8641-6e2f0fc4aac8' with object id '565e8efe-af95-4b0c-8641-6e2f0fc4aac8' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/9e6b1432-c830-41fb-9c63-b1de69af46dd/resourceGroups/MC_cpe-demo-cjw-k8s-rg_cpe-demo-cjw-k8s_westeurope/providers/Microsoft.Authorization/roleAssignments/194a3266-434d-420c-abc2-0b33ba02640b' or the scope is invalid. If access was recently granted, please refresh your credentials.\""
What you expected to happen:
The cluster to be created successfully
How to reproduce it (as minimally and precisely as possible):
resource "azurerm_kubernetes_cluster" "aks" {
name = "aks"
location = "westeurope"
resource_group_name = "aks-resource-group"
dns_prefix = "aks"
kubernetes_version = "1.13.5
agent_pool_profile {
name = "nodepool"
count = "3"
vm_size = "Standard_DS2_v2"
os_type = "Linux"
os_disk_size_gb = 30
}
service_principal {
client_id = REDACTED
client_secret = REDACTED
}
}
Anything else we need to know?:
We've tested this with a service principal with the exact same permissions, which actually works fine but when using a managed identity, it seems to produce the aforementioned error each time.
The referenced client id in the error, '565e8efe-af95-4b0c-8641-6e2f0fc4aac8', does not actually exist in our AD tenant.
See:
~ $ az ad sp show --id 565e8efe-af95-4b0c-8641-6e2f0fc4aac8
{
"accountEnabled": "True",
"addIns": [],
"alternativeNames": [],
"appDisplayName": "AzureContainerService",
"appId": "7319c514-987d-4e9b-ac3d-d38c4f427f4c",
"appOwnerTenantId": "f8cdef31-a31e-4b4a-93e4-5f571e91255a",
"appRoleAssignmentRequired": false,
"appRoles": [],
"applicationTemplateId": null,
"deletionTimestamp": null,
"displayName": "AzureContainerService",
"errorUrl": null,
"homepage": null,
"informationalUrls": {
"marketing": null,
"privacy": null,
"support": null,
"termsOfService": null
},
"keyCredentials": [],
"logoutUrl": null,
"notificationEmailAddresses": [],
"oauth2Permissions": [],
"objectId": "565e8efe-af95-4b0c-8641-6e2f0fc4aac8",
"objectType": "ServicePrincipal",
"odata.metadata": "https://graph.windows.net/f55b1f7d-7a7f-49e4-9b90-55218aad89f8/$metadata#directoryObjects/@Element",
"odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
"passwordCredentials": [],
"preferredSingleSignOnMode": null,
"preferredTokenSigningKeyEndDateTime": null,
"preferredTokenSigningKeyThumbprint": null,
"publisherName": "Microsoft Services",
"replyUrls": [],
"samlMetadataUrl": null,
"samlSingleSignOnSettings": null,
"servicePrincipalNames": [
"7319c514-987d-4e9b-ac3d-d38c4f427f4c"
],
"servicePrincipalType": "Application",
"signInAudience": "AzureADMultipleOrgs",
"tags": [],
"tokenEncryptionKeyId": null
}
When creating the AKS cluster using the CLI from a machine that has the managed identity assigned:
~$ az login --identity
[
{
"environmentName": "AzureCloud",
"id": "<REDACTED>",
"isDefault": true,
"name": "<REDACTED>",
"state": "Enabled",
"tenantId": "<REDACTED>",
"user": {
"assignedIdentityInfo": "MSI",
"name": "systemAssignedIdentity",
"type": "servicePrincipal"
}
}
]
~$ az aks create \
> --resource-group aks-resource-group \
> --name myAKSCluster \
> --node-count 1 \
> --service-principal <REDACTED> \
> --client-secret <REDACTED> \
> --generate-ssh-keys
SSH key files '/home/jenkins/.ssh/id_rsa' and '/home/jenkins/.ssh/id_rsa.pub' have been generated under ~/.ssh to allow SSH access to the VM. If using machines without permanent storage like Azure Cloud Shell without an attached file share, back up your keys to a safe location
- Running ..
{
"aadProfile": null,
"addonProfiles": null,
"agentPoolProfiles": [
{
"availabilityZones": null,
"count": 1,
"enableAutoScaling": null,
"maxCount": null,
"maxPods": 110,
and so on, indicating a successful creation..
Hi @cwebbtw,
since yesterday we have exactly same issue with aks cluster creation. But after 3-4 attempts it was successful to create a new aks cluster with managed identity. We create a support case today and awaiting feedback from azure.
I'm not sure this is using a managed identity anymore; this failed earlier with the same message when using a standard user with owner permissions.
Hi,
as per Azure support this issue is fixed.
There was small disturbance regarding cluster creation and now its solved
hey @cwebbtw
Thanks for opening this issue :)
Given Azure mentioned that this was an issue on their side - are you still seeing this issue?
Thanks!
@tombuildsstuff According to https://github.com/Azure/AKS/issues/1123, the Azure team have closed the issue as I could not reproduce this subsequent to them notifying us of a problem in westeurope.
I'll close this issue.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
Hi @cwebbtw,
since yesterday we have exactly same issue with aks cluster creation. But after 3-4 attempts it was successful to create a new aks cluster with managed identity. We create a support case today and awaiting feedback from azure.