Terraform-provider-azurerm: keyvault: caching of metadata required in larger terraform configurations

Created on 20 Mar 2020  路  17Comments  路  Source: terraform-providers/terraform-provider-azurerm

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.23
provider.azurerm v2.2.0

Affected Resource(s)

  • azurerm_key_vault
  • azurerm_key_vault_secret
  • azurerm_virtual_machine_scale_set_extension

Terraform Configuration Files


resource "azurerm_virtual_machine_scale_set_extension" "vmss_ext_msi" {
  virtual_machine_scale_set_id = azurerm_virtual_machine_scale_set.vmss.id
  name                         = "ManagedIdentityWindowsExtension"
  publisher                    = "Microsoft.ManagedIdentity"
  type                         = "ManagedIdentityExtensionForWindows"
  type_handler_version         = "1.0"
  settings                     = "{\"port\": 50342}"
  auto_upgrade_minor_version   = true
}
resource "azurerm_key_vault_secret" "kv_secret_01" {
  name         = "mySecret"
  value        = var.my_value
  key_vault_id = azurerm_key_vault.kv.id
  content_type = "text/plain"

  tags = {
    name        = "My Parameters"
  }

  depends_on = [
    azurerm_key_vault.kv,
    azurerm_key_vault_access_policy.kv_some_policy,
  ]
}
resource "azurerm_key_vault" "kv" {
  name                            = "my-kv"
  resource_group_name             = azurerm_resource_group.rg.name
  location                        = azurerm_resource_group.rg.location
  tenant_id                       = var.tenant_id
  enabled_for_deployment          = true
  enabled_for_disk_encryption     = true
  enabled_for_template_deployment = true
  purge_protection_enabled        = true
  soft_delete_enabled             = true

  sku_name = "standard"

  network_acls {
    default_action = "Deny"
    bypass         = "AzureServices"

    ip_rules = ["***","***"]

    virtual_network_subnet_ids = [
      azurerm_subnet.sub1.id,
      azurerm_subnet.sub2.id,
      azurerm_subnet.sub3_deployment_machine.id,
    ]
  }

  tags = {
    name        = "my good kv"
  }
}

Debug Output

Panic Output

Expected Behavior

Terraform/AzureRM should always able to create a plan.

Actual Behavior


Every "terraform plan" action occurs different errors but the original error is the same which is "Context Deadline Exceeded". Sometimes we can create a plan with the all same resources but today I tried more than 15 times and no luck. Please see below for some errors which I selected uniquely. This kind of error occurs multiple times and for different resources in every plan action.

Error: Error checking if key vault "/subscriptions/***/resourceGroups/***/providers/Microsoft.KeyVault/vaults/***" for Secret "***" in Vault at url "https://***.vault.azure.net/" exists: Error making Read request on KeyVault "***" (Resource Group "***"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Error: Error retrieving the Resource ID the Key Vault at URL "https://***.vault.azure.net/": Error making Read request on KeyVault "***" (Resource Group "***"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Error: Error retrieving Extension "ManagedIdentityWindowsExtension" (Virtual Machine Scale Set "***" / Resource Group "***"): compute.VirtualMachineScaleSetExtensionsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Error: Error retrieving the Resource ID the Key Vault at URL "https://***.vault.azure.net/": Error GetKeyVaultId unable to list Key Vaults keyvault.VaultsClient#List: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Error: Error making Read request on AzureRM App Service "***": web.AppsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Steps to Reproduce

  1. terraform plan

Important Factoids

  • Totally it's refreshing more than 1200 resources while planning. We have almost 80 key vaults and they contain lots of secrets. So most of the errors are related to Key Vaults.
  • These resources are already created and exist on Azure. So I can't use timeouts for creation and deletion. Also read timeouts didn't work. I've added it about 450 resources. I guess we need auto-retry..
  • If we use resource targeting for those resources which we see on errors, we can create the plan successfully. So resources and our connections are healthy.
  • We are only passing the "-refresh=true" parameter for plan action.
  • I've also tried with a parallelism parameter but no luck.
  • Trying to plan from both a VM on Azure and my laptop.

References

bug servickeyvault

Most helpful comment

this is a serious problem, we have maybe 100 or so keyvaults in a subscription, when we run terraform plan/apply it bombs out trying to get keyvaults which have nothing to do with the current config.

Error: Error retrieving the Resource ID the Key Vault at URL "https://xxx-tfcfg-dev-kv.vault.azure.net/": Error making Read request on KeyVault "xyzkv0001" (Resource Group "xyzrg0001"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Why is the provder trying to list every keyvault?

All 17 comments

Any help with this issue? If it's related to Azure I'll open a support ticket. I can't create Terraform plan.

Any news with this issue? I can't create a Terraform plan.

@gnlds from the error message and your description have almost 80 key vaults and they contain lots of secrets, I think there is a performance issue related with keyvaults. I am fixing it and will submit a PR soon

@tombuildsstuff, I see that you closed the PR attached.

Do you disagree that looping through all the keyvaults in a subscription(and making GET requests for each) to get the id of a single keyvault is excessive?

I think the PR #6866 would have reduced the noise.

I dont think refreshing a single secret should invoke 200 GET requests just because the subscription has 200 keyvaults. Can we at least filter by name when trying to get the ID here?

We are experiancing this problem as well. Can we get help here? We have ~20 keyvault secrets and it dies out on them as well.

IRONY: All the secrets get created - however the terraform sits on about 8 of them with "Still creating..." until they time out with the exact error above.

@njuCZ , @tombuildsstuff -- please advise as to the status on this issue. We are having terrible problems with creating more than 20 secrets to a single vault. The following is what we get:

Error retrieving the Resource ID the Key Vault at URL "https://{redacted}.vault.azure.net/": Error making Read request on KeyVault "{redeacted - NOT SAME VAULT NAME!!!}" (Resource Group "{redacted - WRONG RESOURCE GROUP!!}"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded

It is like terraform (or the azurerm provider) is scanning the entire subscription for keyvaults.

Just an update from our side: our subscription had so many other keyvaults(not managed by terraform) that our terraform code(that only created secrets) started impacting azure api availability. We actually started causing 504s from the internal ARM Load Balancers. They(Azure) are investigating the issue, but this inefficient provider code is what was causing the traffic spike.

Again my understanding of the issue is this is because for every azurerm_keyvault_secret the provider:

  1. lists all keyvaults
  2. Makes a get request for each key vault until it finds the correct key vault

So if you manage 50 secrets in your tf code, and you have 1000 other key vaults in your subscription(not managed by tf) There could be up to 50,000 get requests.

Someone let me know if I misunderstood the provider code

It feels like this WONT get fixed.... @a200462790 I believe you are right this is the problem.. we have 3 subscriptions
DEV (200 keyvaults) -- nothing fails...
TEST (800+keyvaults) -- Failures are intermittent...
PROD (2500+ keyvaults) -- Failures are constant...

I'm running into issues with ~50 keyvaults with a total of ~200 secrets.

this is a serious problem, we have maybe 100 or so keyvaults in a subscription, when we run terraform plan/apply it bombs out trying to get keyvaults which have nothing to do with the current config.

Error: Error retrieving the Resource ID the Key Vault at URL "https://xxx-tfcfg-dev-kv.vault.azure.net/": Error making Read request on KeyVault "xyzkv0001" (Resource Group "xyzrg0001"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=429 -- Original Error: context deadline exceeded

Why is the provder trying to list every keyvault?

Just an update from our side: our subscription had so many other keyvaults(not managed by terraform) that our terraform code(that only created secrets) started impacting azure api availability. We actually started causing 504s from the internal ARM Load Balancers. They(Azure) are investigating the issue, but this inefficient provider code is what was causing the traffic spike.

Again my understanding of the issue is this is because for every azurerm_keyvault_secret the provider:

  1. lists all keyvaults
  2. Makes a get request for each key vault until it finds the correct key vault

So if you manage 50 secrets in your tf code, and you have 1000 other key vaults in your subscription(not managed by tf) There could be up to 50,000 get requests.

Someone let me know if I misunderstood the provider code

Based on my own experience I think you are correct.

It appears this bit of code
https://github.com/terraform-providers/terraform-provider-azurerm/blob/e70247681deda1f6b482c8aec3b64907ac84ffce/azurerm/helpers/azure/key_vault.go#L43 calls this function
https://github.com/Azure/azure-sdk-for-go/blob/master/services/keyvault/mgmt/2018-02-14/keyvault/vaults.go#L445 which in turn enumerates every Key Vault in subscription, I don't know enough about the why provider needs to do this but seems unusual. It would be more efficient to simply check if keyvault in terraform exists or not using https://github.com/Azure/azure-sdk-for-go/blob/master/services/keyvault/mgmt/2018-02-14/keyvault/vaults.go#L296

As I don't know enough about the provider my understanding on listing all keyvaults could be wrong.

I was about to update our internal provider to support keyvaults without thrashing azure management api. As an experiment, I scoped the service principal (one used for terraform) key vault permission to just the resource group it creates resources in and problem went away.

So for the time being my problem has been fixed however, I still think provider shouldn't attmept to list all keyvaults when config has nothing to do with most of them.

Is there any plans for dealing with this issue? The Service Principal we use to create Resource Groups in Azure can now not update due to this issue. We regrettable also use DevTest Labs which creates a keyvault per VM, recently we have hit the upper limit and don't have an easy path forward to fix our issue.

I have a similar issue.
There are 2 KeyVault in one state file, and terraform dying with
Error: Error making Read request on KeyVault "my-env-kv1" (Resource Group "my-rg"): keyvault.VaultsClient#Get: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded
These KV's are totally empty as were newly created.

Limit your service principal IAM permissions. If it had access to management group or subscribtion you will hit this problem

Limiting the service principal won't solve the problem if the SPN is also used to deploy the resource groups into a subscription.

Is there any chance to get the key vault cache PR moving again?

We're lucky as we create all our Keyvaults in a single resource group. This service principal also deploys resource groups. So what we did is limit IAM permissions for Keyvaults to just that one resource group and left all other permissions.

Was this page helpful?
0 / 5 - 0 ratings