Azure-sdk-for-js: Authentication is throttled when concurrently uploading a large file and several smaller ones

Created on 15 May 2020  路  17Comments  路  Source: Azure/azure-sdk-for-js

  • Package Name: @azure/storage-blob
  • Package Version: 12.1.1
  • Operating system: Ubuntu
  • [x] nodejs

    • version: 12.9

  • [ ] browser

    • name/version:

  • [x] typescript

    • version: 3.8

  • Is the bug related to documentation in

Describe the bug

We are uploading several large files to blob store after our github actions build completes. One of them is over 500MB. The rest are under 250MB (that's important because concurrency only kicks in on files greater than 250MB).

When we try to upload all these files in parallel, we get the error below. It looks like our authentication is being throttled because we are trying to make ~9 upload requests almost simultaneously. Though, it's surprising to me that this is causing throttling.

[2020-05-15 04:54:30.034 +0000] ERROR (AggregateAuthenticationError on ed262ba1d9f5): Authentication failed to complete due to the following errors:

AuthenticationError: An error was returned while authenticating to Azure Active Directory (status code 400).

More details:

{
  "error": "missing_environment_variables",
  "errorDescription": "EnvironmentCredential cannot return a token because one or more of the following environment variables is missing:\n\nAZURE_TENANT_ID\nAZURE_CLIENT_ID\nAZURE_CLIENT_SECRET\nAZURE_CLIENT_CERTIFICATE_PATH\nAZURE_USERNAME\nAZURE_PASSWORD\n\nTo authenticate with a service principal AZURE_TENANT_ID, AZURE_CLIENT_ID, and either AZURE_CLIENT_SECRET or AZURE_CLIENT_CERTIFICATE_PATH must be set.  To authenticate with a user account AZURE_TENANT_ID, AZURE_USERNAME, and AZURE_PASSWORD must be set.\n"
}

AuthenticationError: An error was returned while authenticating to Azure Active Directory (status code 429).

More details:

{
  "error": "unknown_error",
  "errorDescription": "An unknown error occurred and no additional details are available."
}

Error: Azure CLI could not be found.  Please visit https://aka.ms/azure-cli for installation instructions and then, once installed, authenticate to your Azure account using 'az login'.
    AggregateAuthenticationError: Authentication failed to complete due to the following errors:

    AuthenticationError: An error was returned while authenticating to Azure Active Directory (status code 400).

    More details:

    {
      "error": "missing_environment_variables",
      "errorDescription": "EnvironmentCredential cannot return a token because one or more of the following environment variables is missing:\n\nAZURE_TENANT_ID\nAZURE_CLIENT_ID\nAZURE_CLIENT_SECRET\nAZURE_CLIENT_CERTIFICATE_PATH\nAZURE_USERNAME\nAZURE_PASSWORD\n\nTo authenticate with a service principal AZURE_TENANT_ID, AZURE_CLIENT_ID, and either AZURE_CLIENT_SECRET or AZURE_CLIENT_CERTIFICATE_PATH must be set.  To authenticate with a user account AZURE_TENANT_ID, AZURE_USERNAME, and AZURE_PASSWORD must be set.\n"
    }

    AuthenticationError: An error was returned while authenticating to Azure Active Directory (status code 429).

    More details:

    {
      "error": "unknown_error",
      "errorDescription": "An unknown error occurred and no additional details are available."
    }

    Error: Azure CLI could not be found.  Please visit https://aka.ms/azure-cli for installation instructions and then, once installed, authenticate to your Azure account using 'az login'.
        at DefaultAzureCredential.<anonymous> (/runner/node_modules/@azure/identity/dist/index.js:174:29)
        at Generator.throw (<anonymous>)
        at rejected (/runner/node_modules/tslib/tslib.js:111:69)

And when we changed our upload so that we're not using concurrency, we had no problems.

To Reproduce
Steps to reproduce the behavior:
Upload one large file (> 250MB) and ~5 smaller ones using an instance of BlockBlobClient. Like this:

blockBlobClient.uploadFile(largeFile)
blockBlobClient.uploadFile(file1)
blockBlobClient.uploadFile(file2)
blockBlobClient.uploadFile(file3)
blockBlobClient.uploadFile(file4)
blockBlobClient.uploadFile(file5)

And when we avoid concurrency like this:

blockBlobClient.uploadFile(file, {
   concurrency: 1,
})

we have no issues.

Expected behavior
The upload should succeed.

Client Storage customer-reported question

All 17 comments

@aeisenberg looks like you are using DefaultAzureCredential? Would it be possible for you to test using StorageSharedKeyCredential to see whether the issue repros?

Thanks. We'll look at this.

@jeremymeng, why would the the DefaultAzureCredential not work? Seems like this is a pretty default case.

Welp, it seems that IMDS only supports 5 queries per second and the ManagedIdentityCredential doesn't handle 429s.

Per IMDS docs.

429 Too Many Requests | The API currently supports a maximum of 5 queries per second

Further guidance states that a IMDS client should follow exponential backoff in the face of 429s.

Hmmm...that's pretty low. Thanks for pointing it out. So, we would expect that using a concurrency level of > 5 would _always_ fail. Perhaps this is something good to mention in the docs.

@aeisenberg I鈥檓 guessing this was an oversight in implementation. The error should be transient, so a simple retry in the ManagedIdentityCredential would probably solve the problem.

Might be a nice contribution if you feel like opening a PR.

Thanks @devigned

@jeremymeng Would tweaking the retry options when creating the credential help based on the above comment from @devigned ?

Going by https://github.com/covid-modeling/model-runner/issues/32#issuecomment-629684698, it looks like we should rather look into why the token is not being cached

cc @jonathandturner to look into this from the credential side

@jeremymeng Would tweaking the retry options when creating the credential help based on the above comment from @devigned ?

Our throttlingRetryPolicy handles 429 but only retries once and there's a way to configure it.

Sorry typo. There isn't a way to configure it. Throttling retry policy doesn't do exponential backoff retry.

Thanks @jeremymeng, so am guessing this means that our normal retry policy does not treat 429 as retryable, and that the throttling retry policy retries only once?

@devigned,
I looked into token caching, and we do cache the access token and re-use it for subsequent calls.

@aeisenberg, as a workaround for now, can you try doing 1 upload first, and then do your 9 concurrent uploads after the first one completes? The first success would result in caching the token that would be then re-used by subsequent requests.

That would be an easy enough to do. Thanks for the suggestion, @ramya-rao-a. To be clear though, we鈥檙e only uploading 4 files and it鈥檚 because one of them is >250mb that the library automatically enables chunking the request that we hit the limit.

I can look for the smallest file to upload first and then upload the remainder when it鈥檚 complete.

Thanks @jeremymeng, so am guessing this means that our normal retry policy does not treat 429 as retryable, and that the throttling retry policy retries only once?

Correct.

@devigned,
I looked into token caching, and we do cache the access token and re-use it for subsequent calls.

@aeisenberg, as a workaround for now, can you try doing 1 upload first, and then do your 9 concurrent uploads after the first one completes? The first success would result in caching the token that would be then re-used by subsequent requests.

Good point! The first call to use MSI credential can be other methods too, for example, getProperties(), if it keeps the upload logic simpler.

Thanks @jeremymeng, for long term, is there plan to support configuring the throttlingRetryPolicy?
And any suggestion for token refresh scenario in short term and long term? Where the event to trigger refresh might be the expiration(or close to expiration) of the existing access token.

And an suggestion for token refresh scenario in short term and long term? Where the event to trigger refresh might be the expiration(or close to expiration) of the existing access token.

@jonathandturner can probably answer this better.

I logged the following issue to follow up

  • #8991 for throttlingRetryPolicy configuration
  • #8992 for potential improvement in MSI credential

@jeremymeng, I tried your suggestion of initializing the auth by calling getProperties() before uploading and it is working for me! Thanks for your suggestion.

I'm happy with this workaround right now, but if you want me to try out something else, let me know.

Thats good to hear @aeisenberg

We will close this issue and follow up in the 2 issues @jeremymeng logged above to see how we can improve. We will get back here if there are any other things we would want you to try out.

Was this page helpful?
0 / 5 - 0 ratings