Terraform v0.11.10
AzureRM Provider v1.20.0
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "${var.name}"
location = "${var.region}"
dns_prefix = "${var.name}"
kubernetes_version = "${var.kubernetes_version}"
resource_group_name = "${azurerm_resource_group.aks_resource_group.name}"
linux_profile {
admin_username = "xxx"
ssh_key {
key_data = "${var.ssh_public_key}"
}
}
agent_pool_profile {
count = "${var.node_count}"
name = "agentpool"
vm_size = "${var.vm_size}"
os_disk_size_gb = "${var.os_disk_size}"
os_type = "Linux"
vnet_subnet_id = "${azurerm_subnet.private.id}"
max_pods = 110
}
service_principal {
client_id = "${azurerm_azuread_service_principal.service_principal.application_id}"
client_secret = "${random_string.service_principal_password.result}"
}
role_based_access_control {
enabled = true
azure_active_directory {
client_app_id = "${var.rbac_client_app_id}"
server_app_id = "${var.rbac_server_app_id}"
server_app_secret = "${var.rbac_server_app_secret}"
}
}
network_profile {
network_plugin = "azure"
}
depends_on = [
"azurerm_azuread_service_principal.service_principal",
"azurerm_azuread_service_principal_password.password",
]
tags {
environment = "${var.environment}"
name = "${var.name}"
}
}
Unfortunately this happens intermittently, so I haven't been able to get debug output It started happening after I upgraded to AzureRM Provider v1.20, but I'm not sure if there is a connection.
Running terraform destroy
should successfully delete the terraform provisioned AKS cluster on the first attempt.
Running terraform destroy
does not always successfully delete the terraform provisioned AKS cluster on the first attempt. It always succeeds on a second attempt.
The error produced:
Error: Error applying plan:
1 error(s) occurred:
* module.aks_cluster.azurerm_kubernetes_cluster.aks_cluster (destroy): 1 error(s) occurred:
* azurerm_kubernetes_cluster.aks_cluster: Error waiting for the deletion of Managed Kubernetes Cluster "test-westus2" (Resource Group "aks-rg-test-westus2"): azure.BearerAuthorizer#WithAuthorization:
Failed to refresh the Token for request to https://management.azure.com/subscriptions/<subscription_id>providers/Microsoft.ContainerService/locations/westus2/operations/<id>?api-version=2016-03-30: StatusCode=0 --
Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token
This unfortunately happens intermittently. But running a terraform destroy
on an AKS cluster sometimes results in the error above.
I got the same error not only deletion of cluster, but also creation.
Error: Error applying plan:
1 error(s) occurred:
* module.primary.azurerm_kubernetes_cluster.aks: 1 error(s) occurred:
* azurerm_kubernetes_cluster.aks: Error waiting for completion of Managed Kubernetes Cluster "mycluster" (Resource Group "myrg"): azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/mysubscription/providers/Microsoft.ContainerService/locations/japaneast/operations/myid?api-version=2017-08-31: StatusCode=0 -- OriginalError: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token
The same error happened besides AKS cluster creation/deletion. It seems that the error occurs in long-running plan/apply. The following is an example of it during Resource Group deletion at the end of long-running apply.
Error: Error applying plan:
1 error(s) occurred:
* azurerm_resource_group.shared (destroy): 1 error(s) occurred:
* azurerm_resource_group.shared: Error deleting Resource Group "myrg": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to
https://management.azure.com/subscriptions/myid/operationresults/myresult?api-version=2018-05-01: StatusCode=0 -- Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token
@katbyte Do you have any advice?
hi @btai24 @ToruMakabe
Thanks for opening this issue :)
This appears to be a bug in the authentication logic which we use to connect to Azure (specifically how it handles refreshing tokens) - as such this would require a bug-fix there (which is handled in this repository: http://github.com/hashicorp/go-azure-helpers). So that we're able to diagnose this further - would it be possible to know which method you're using to authenticate with Azure from Terraform (e.g. the Azure CLI / a Service Principal with a Client Secret etc)?
Thanks!
@tombuildsstuff Thanks! I use Azure CLI.
@ToruMakabe thanks for confirming that. Since this appears to be an issue in the upstream library I've created an upstream issue for this: https://github.com/hashicorp/go-azure-helpers/issues/22
As a workaround, if you're using az login and your individual account, this doesn't happen with "az login --use-device-code".
@tombuildsstuff late reply, but i also use the Azure CLI
(revisiting this issue because I'm still running into this)
Hit these errors with [email protected]
Downgrading to 2.0.64 resolved the issue for me.
Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.
Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.
I ran the az login command again and provisioned with success now. So I guess there is an expiry token issue.
+1 same issue
seeing the same issue here. it's intermittent on apply or destroy operations with no apparent pattern (sometimes even after a few minutes, so the long running operation does not seem to count much). Anyone know if we can refresh the token or login again during the terraform apply ? I'm using latest (azure-cli 2.0.69) version.
I opened a PR to fix this a while back: https://github.com/hashicorp/go-azure-helpers/pull/39
Hopefully, it will get some attention soon.
Update: the PR is closed by maintainers without merging.
Great to see this is upcoming - just ran into this today for AKS creation
It looks like https://github.com/Azure/go-autorest/pull/476 was just recently merged in, so once it gets incorporated downstream this issue should be fixed.
@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)
This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:
provider "azurerm" {
version = "~> 1.37.0"
}
# ... other configuration ...
We upgraded to 1.44.0 version of azurerm and now I'm seeing this problem first time. Anyone else experiencing this?
this is what I have now:
Terraform v0.12.21
provider.azuread v0.7.0
provider.azurerm v1.44.0
provider.random v2.2.1
Same for me.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)