Terraform-provider-azurerm: Getting an intermittent failed to refresh Bearer token error when trying to delete my AKS cluster

Created on 4 Jan 2019 · 20Comments · Source: terraform-providers/terraform-provider-azurerm

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.11.10
AzureRM Provider v1.20.0

Affected Resource(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name       = "${var.name}"
  location   = "${var.region}"
  dns_prefix = "${var.name}"

  kubernetes_version  = "${var.kubernetes_version}"
  resource_group_name = "${azurerm_resource_group.aks_resource_group.name}"

  linux_profile {
    admin_username = "xxx"

    ssh_key {
      key_data = "${var.ssh_public_key}"
    }
  }

  agent_pool_profile {
    count = "${var.node_count}"

    name            = "agentpool"
    vm_size         = "${var.vm_size}"
    os_disk_size_gb = "${var.os_disk_size}"
    os_type         = "Linux"
    vnet_subnet_id  = "${azurerm_subnet.private.id}"
    max_pods        = 110
  }

  service_principal {
    client_id     = "${azurerm_azuread_service_principal.service_principal.application_id}"
    client_secret = "${random_string.service_principal_password.result}"
  }

  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = "${var.rbac_client_app_id}"
      server_app_id     = "${var.rbac_server_app_id}"
      server_app_secret = "${var.rbac_server_app_secret}"
    }
  }

  network_profile {
    network_plugin = "azure"
  }

  depends_on = [
    "azurerm_azuread_service_principal.service_principal",
    "azurerm_azuread_service_principal_password.password",
  ]

  tags {
    environment = "${var.environment}"
    name        = "${var.name}"
  }
}

Debug Output

Unfortunately this happens intermittently, so I haven't been able to get debug output It started happening after I upgraded to AzureRM Provider v1.20, but I'm not sure if there is a connection.

Expected Behavior

Running terraform destroy should successfully delete the terraform provisioned AKS cluster on the first attempt.

Actual Behavior

Running terraform destroy does not always successfully delete the terraform provisioned AKS cluster on the first attempt. It always succeeds on a second attempt.

The error produced:

Error: Error applying plan:

1 error(s) occurred:

* module.aks_cluster.azurerm_kubernetes_cluster.aks_cluster (destroy): 1 error(s) occurred:

* azurerm_kubernetes_cluster.aks_cluster: Error waiting for the deletion of Managed Kubernetes Cluster "test-westus2" (Resource Group "aks-rg-test-westus2"): azure.BearerAuthorizer#WithAuthorization: 
Failed to refresh the Token for request to https://management.azure.com/subscriptions/<subscription_id>providers/Microsoft.ContainerService/locations/westus2/operations/<id>?api-version=2016-03-30: StatusCode=0 -- 
Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

Steps to Reproduce

This unfortunately happens intermittently. But running a terraform destroy on an AKS cluster sometimes results in the error above.

authentication bug upstream

Source

btai24

👍66

Most helpful comment

@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)

tombuildsstuff on 25 Oct 2019

👍4 🎉1

All 20 comments

I got the same error not only deletion of cluster, but also creation.

Error: Error applying plan:

1 error(s) occurred:

* module.primary.azurerm_kubernetes_cluster.aks: 1 error(s) occurred:

* azurerm_kubernetes_cluster.aks: Error waiting for completion of Managed Kubernetes Cluster "mycluster" (Resource Group "myrg"): azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/mysubscription/providers/Microsoft.ContainerService/locations/japaneast/operations/myid?api-version=2017-08-31: StatusCode=0 -- OriginalError: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

ToruMakabe on 11 Jan 2019

The same error happened besides AKS cluster creation/deletion. It seems that the error occurs in long-running plan/apply. The following is an example of it during Resource Group deletion at the end of long-running apply.

Error: Error applying plan:

1 error(s) occurred:

* azurerm_resource_group.shared (destroy): 1 error(s) occurred:

* azurerm_resource_group.shared: Error deleting Resource Group "myrg": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to
https://management.azure.com/subscriptions/myid/operationresults/myresult?api-version=2018-05-01: StatusCode=0 -- Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

@katbyte Do you have any advice?

ToruMakabe on 15 Jan 2019

👍1

hi @btai24 @ToruMakabe

Thanks for opening this issue :)

This appears to be a bug in the authentication logic which we use to connect to Azure (specifically how it handles refreshing tokens) - as such this would require a bug-fix there (which is handled in this repository: http://github.com/hashicorp/go-azure-helpers). So that we're able to diagnose this further - would it be possible to know which method you're using to authenticate with Azure from Terraform (e.g. the Azure CLI / a Service Principal with a Client Secret etc)?

Thanks!

tombuildsstuff on 15 Jan 2019

@tombuildsstuff Thanks! I use Azure CLI.

ToruMakabe on 15 Jan 2019

@ToruMakabe thanks for confirming that. Since this appears to be an issue in the upstream library I've created an upstream issue for this: https://github.com/hashicorp/go-azure-helpers/issues/22

tombuildsstuff on 17 Jan 2019

👍4

As a workaround, if you're using az login and your individual account, this doesn't happen with "az login --use-device-code".

OffColour on 6 Mar 2019

👍2

@tombuildsstuff late reply, but i also use the Azure CLI

(revisiting this issue because I'm still running into this)

btai24 on 11 Mar 2019

👍1

Hit these errors with [email protected]

Downgrading to 2.0.64 resolved the issue for me.

PriceChild on 11 Jun 2019

Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.

markokole on 9 Jul 2019

Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.

I ran the az login command again and provisioned with success now. So I guess there is an expiry token issue.

markokole on 9 Jul 2019

+1 same issue

mariojacobo on 18 Jul 2019

seeing the same issue here. it's intermittent on apply or destroy operations with no apparent pattern (sometimes even after a few minutes, so the long running operation does not seem to count much). Anyone know if we can refresh the token or login again during the terraform apply ? I'm using latest (azure-cli 2.0.69) version.

mariojacobo on 22 Jul 2019

I opened a PR to fix this a while back: https://github.com/hashicorp/go-azure-helpers/pull/39
Hopefully, it will get some attention soon.

Update: the PR is closed by maintainers without merging.

mikhailshilkov on 17 Sep 2019

👍5

Great to see this is upcoming - just ran into this today for AKS creation

theharleyquin on 20 Sep 2019

It looks like https://github.com/Azure/go-autorest/pull/476 was just recently merged in, so once it gets incorporated downstream this issue should be fixed.

amasover on 24 Oct 2019

👍1

@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)

tombuildsstuff on 25 Oct 2019

👍4 🎉1

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

hashibot[bot] on 26 Nov 2019

We upgraded to 1.44.0 version of azurerm and now I'm seeing this problem first time. Anyone else experiencing this?

this is what I have now:
Terraform v0.12.21

provider.azuread v0.7.0
provider.azurerm v1.44.0
provider.random v2.2.1

dl888888 on 28 Feb 2020

Same for me.

alexyakunin on 29 Feb 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot[bot] on 29 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings