Azure-sdk-for-js: Javascript SDK DefaultAzureCredentials stops working under high load

Created on 24 Sep 2020  路  10Comments  路  Source: Azure/azure-sdk-for-js

  • Package Name: @azure/identity"
  • Package Version: 1.1.0
  • Operating system: Linux
  • [x] nodejs

    • version: 12.13.0

  • [ ] browser

    • name/version:

  • [ ] typescript

    • version:

  • Is the bug related to documentation in

Describe the bug

We have a cluster with many pods running NodeJS. We use managed identity to access Azure resources, and for this we use DefaultAzureCredentials from the javascript SDK. What we observed is that under heavy load after some point some pods cannot get a token anymore, basically they end up in a zombie state and cannot access any Azure resource.

To Reproduce
Many pods using managed identity.

Additional context
Add any other context about the problem here.

We believe that the issue is that ManagedIdentityCredentials class caches the negative results, if a call timeouts then getting the token will not attempted anymore:
https://github.com/Azure/azure-sdk-for-js/blob/dcae3ace0872180e0a542a00ca8c8c0b427def42/sdk/identity/identity/src/credentials/managedIdentityCredential.ts

          // the latter indicating that we don't yet know whether
          // the endpoint is available and need to check for it.
          if (this.isEndpointUnavailable !== true) {
            result = await this.authenticateManagedIdentity(
              scopes,
              this.isEndpointUnavailable === null,
              this.clientId,
              newOptions
            );

            // If authenticateManagedIdentity returns null, it means no MSI
            // endpoints are available.  In this case, don't try them in future
            // requests.
            this.isEndpointUnavailable = result === null;
          } else {
            const error = new CredentialUnavailable(
              "The managed identity endpoint is not currently available"
            );
            logger.getToken.info(formatError(error));
            throw error;
          }
Azure.Identity Client customer-reported needs-team-attention question tenet-reliability

Most helpful comment

@balazsmolnar To help us narrow this issue down, we'll be asking you to test our latest Identity beta version once we release it, most likely today. I'll follow up with instructions as soon as I'm able to. Thank you for your time!

All 10 comments

@balazsmolnar Hello! I'm Daniel, I'll be doing my best to help you.

Recently, we did considerable fixes to the Managed Identity Credential. Out of a quick guess, I believe that our fixes could have had a positive impact to resolve this issue.

What we recently did was: https://github.com/Azure/azure-sdk-for-js/pull/11426

These changes will go out next week. Before we release, I'll do my best to reach to a conclusion on what addresses this issue. If our recent changes don't fix this issue, I'll do my best to provide a fix before our upcoming release.

I'll be able to provide an update by Monday! Thank you for your patience.

@balazsmolnar We're actively talking about this internally. In the mean time, do you mind answering me the following questions?

  • Do you have a sample code we could try?
  • Do you have the HTTP logs relative to the exception? Or at least some portion of the logs?
  • Where is this running? In Azure containers, for example? The details of the environment can help immensely.
  • For how long was this running without trouble? Or: How many iterations did this run without trouble?

I'll post any updates as soon as I'm able to.

@balazsmolnar If you decide to share some logs with us, please remember to remove any information about your credentials, secrets or passwords.

We will be releasing tomorrow an update to the Identity library that might help your case, but to be sure we would need more information indeed. Thank you for submitting this issue! We want to make sure your issue is resolved. Take your time on providing us as much information as possible.

@sadasant Thanks for looking into it!

We have an AKS cluster in Azure which uses managed identity to access some of the resources e.g. Storage Account, it's pretty basic:

const transferBlobClient = new BlobServiceClient(
      `https://${transferAccountName}.blob.core.windows.net`,
      tokenCredential, { retryOptions: retryOptions }
    );

Note that the credential is cached between requests.

The problem occurs under high load only so it's hard to reproduce. In our load test setup we provision 100 VMs to send requests to the server.
The errors we see in the logs:

EnvironmentCredential is unavailable. Environment variables are not fully configured. Error: ManagedIdentityCredential is unavailable. No managed identity endpoint found. Error: Azure CLI could not be found. Please visit https://aka.ms/azure-cli for installation instructions and then, once installed, authenticate to your Azure account using 'az login'. Error: Visual Studio Code credential requires the optional dependency 'keytar' to work correctly, stack: AggregateAuthenticationError: undefined Error: EnvironmentCredential is unavailable. Environment variables are not fully configured. Error: ManagedIdentityCredential is unavailable. No managed identity endpoint found. Error: Azure CLI could not be found. Please visit https://aka.ms/azure-cli for installation instructions and then, once installed, authenticate to your Azure account using 'az login'. Error: Visual Studio Code credential requires the optional dependency 'keytar' to work correctly at DefaultAzureCredential. (/app/node_modules/@azure/identity/dist/index.js:262:29) at Generator.throw () at rejected (/app/node_modules/@azure/identity/node_modules/tslib/tslib.js:112:69)

Our workaround for the issue is that we cache the access token for a minute to prevent overloading the managed identity endpoint, and we always create a new instance of DefaultAzureCredentials.

Let me know if you need more information.

@balazsmolnar To help us narrow this issue down, we'll be asking you to test our latest Identity beta version once we release it, most likely today. I'll follow up with instructions as soon as I'm able to. Thank you for your time!

@balazsmolnar Hello! I hope things are going good for you!

We have released @azure/[email protected]! Please install it and try again. Your feedback will help us narrow down this issue.

@sadasant I can test it early next week, I will let you know the results.

@balazsmolnar thank you!

@sadasant We tested the fix yesterday, and I'm happy to report that we did not experience any managed identiy related issue. Thnk you for the fix!

@balazsmolnar that's awesome! Thank you so much. I'll close this issue for now. Feel free to re-open this issue or make any other issue anytime! We're here to help 馃憢

Was this page helpful?
0 / 5 - 0 ratings