Azure-sdk-for-js: Intermittent timeout errors from @azure/storage-blob

Created on 2 Dec 2019 · 15Comments · Source: Azure/azure-sdk-for-js

Package Name: @azure/storage-blob
Package Version: 12.0.0
Operating system: Windows Azure App Service
nodejs version 10.14.1

Describe the bug
I am looking for any help troubleshooting a frequent error I am getting using this SDK. The error details are as below, and it happens intermittently

request to https://{accountName}.blob.core.windows.net/path failed, reason: connect ETIMEDOUT 13.70.99.30:443


at ClientRequest [anonymous] (D:\home\site\wwwroot\node_modules\@azure\core-http\node_modules\node-fetch\lib\index.js:1455)
at ClientRequest ClientRequest.clsBind (D:\home\site\wwwroot\node_modules\cls-hooked\context.js:172)
at ClientRequest ClientRequest.emit (events.js:187)
at ClientRequest ClientRequest.emitted (D:\home\site\wwwroot\node_modules\emitter-listener\listener.js:134)
at TLSSocket TLSSocket.socketErrorListener (_http_client.js:391)
at TLSSocket TLSSocket.emit (events.js:182)
at Object emitErrorNT (internal/streams/destroy.js:82)
at Object emitErrorAndCloseNT (internal/streams/destroy.js:50)
at process process._tickCallback (internal/process/next_tick.js:63)

To Reproduce
Steps to reproduce the behavior:
I do not have reliable reproduction steps

Expected behavior
A clear and concise description of what you expected to happen.
The SDK should not encounter timeouts when writing to the blob storage account

Additional context
I understand there is not reliable reproduction steps, but any guidance to help me with this bug, or debug it would be really appreciated

Client Storage customer-reported

Source

js-kyle

Most helpful comment

Just to round this out for people viewing this issue in the future, the timeout problem we were experiencing was a problem with a server in a data center , which we have resolved by migrating.

Thanks for the work to optimise the client caching, and for all the attention this issue received. The support we have received has assured me that this library was a good choice for us.

OP-Klaus on 21 Jan 2020

❤5

All 15 comments

@ljian3377 @jiacfan Can you take a look?

XiaoningLiu on 10 Dec 2019

@js-kyle
You can customize the retry, timeout settings by specifying them in the StoragePipelineOptions.StorageRetryOptions when creating the client. The default settings are:

// Default values of StorageRetryOptions
const DEFAULT_RETRY_OPTIONS: StorageRetryOptions = {
  maxRetryDelayInMs: 120 * 1000,
  maxTries: 4,
  retryDelayInMs: 4 * 1000,
  retryPolicyType: StorageRetryPolicyType.EXPONENTIAL,
  secondaryHost: "",
  tryTimeoutInMs: undefined // Use server side default timeout strategy
};

( You can also specify the individual timeout with the abortSignal option. )

You can enable request/response logging by setting the AZURE_LOG_LEVEL environment variable or dynamically by importing setLogLevel from @azure/logger and calling it with a log level.

Not sure if these can help. @jeremymeng any insight?

ljian3377 on 10 Dec 2019

Just a followup on this problem (I work with @js-kyle): We think that this error is related to TCP connections not being reused despite keepAlive being enabled, which was resulting in many connections being created and hanging around waiting for other connections, and that was sometimes causing the total number of TCP connections to exceed the global limit.

We had a similar problem with storage-queue and have fixed that by changing our code implementation, but we couldn't find a fix for this so we have disabled keepAlive for now. The number of connections is still the same but they close very quickly so it's less risky for this error.

OP-Klaus on 15 Dec 2019

@OP-Klaus thanks for sharing your insights to this problem. The network issue might become complex when it fall into PaaS case, as it's transparent to user how underlay distributes and manages the network resources. Your work around sounds feasible, and feel free to let us know if you need further assist from us.

jiacfan on 16 Dec 2019

👍1

@ramya-rao-a @jeremymeng Can you help check the keepalibve implementation in core-http? Check there is no connection leaking when keep alive is enabled.

XiaoningLiu on 16 Dec 2019

@js-kyle @OP-Klaus do you have more information (code pattern/api used/etc.) that can help us reproduce the issue? We made a fix in https://github.com/Azure/azure-sdk-for-js/pull/5552 to reuse agents when keepAlive or proxy is used. There could be some situation where our caching didn't work as expected. /cc @daviwil

jeremymeng on 16 Dec 2019

@jeremymeng here's a script that reproduces it for me on v12.0.1:

'use strict';

const { BlobServiceClient, StorageSharedKeyCredential } = require('@azure/storage-blob');

const AZURE_CONTENT_ACCOUNT = process.env.AZURE_CONTENT_ACCOUNT;
const AZURE_CONTENT_KEY = process.env.AZURE_CONTENT_KEY;

const sharedKeyCredential = new StorageSharedKeyCredential(AZURE_CONTENT_ACCOUNT, AZURE_CONTENT_KEY);
const blobServiceClient = new BlobServiceClient(
  `https://${AZURE_CONTENT_ACCOUNT}.blob.core.windows.net`,
  sharedKeyCredential,
  { keepAliveOptions: { enable: true } },
);
const containerClient = blobServiceClient.getContainerClient('courses');

const setPageContentToAzure = (location, content) => {
  const blobClient = containerClient.getBlobClient(location);
  const blockBlobClient = blobClient.getBlockBlobClient();

  return blockBlobClient.upload(content, Buffer.byteLength(content));
};

const runTest = async () => {
    for (let i = 0; i < 500; i++) {
        await setPageContentToAzure('pageId' + i, '{ foo: \'bar\'}');
    }
}

runTest().then(_ => console.log('After runTest() resolved'));

Edit: Reduced the script size

OP-Klaus on 17 Dec 2019

@OP-Klaus Thank you very much for the repo code! We will try it out and report back findings.

jeremymeng on 18 Dec 2019

@jeremymeng no problem, thank you for looking into it. I have edited the script to be smaller now and confirmed the issue still happens.

Edit: To clarify, this script reproduces the bug where TCP connections don't get reused, not the timeout errors

OP-Klaus on 18 Dec 2019

I believe the cause is that For each of our clients (BlobServiceClient, ContainerClient, BlobClient, BlockBlobClient, etc.) there's an underlying ServiceClient instance that handles sending request and receiving response from the Azure service. We have cached Http connection at ServiceClient level. However, in our current design two storage clients will not share a same connection (even if they have the same url to the corresponding Azure service resource. They do share the options from parent clients. So what's happening in this repro code is that 500 block blob clients and 500 connection are created with keepAlive enabled.

@OP-Klaus does your real scenario use blob clients with different locations? If that's the case you probably don't want to enable keepAlive because http connections are not share among them and those connection would hang around for much longer time. If you use a same blob client many times it then makes sense to enable keepAlive and also cache the blob client instance based on its location.

BTW you can directly get a block blob client from container client.

Here's my attempt to cache the block blob client

let _blobClients = {};
const getAzureBlobClient = (containerClient, location) => {
  let client = _blobClients[location];
  if (!client) {
    client = _blobClients[location] = containerClient.getBlockBlobClient(location);
  }

  return client;
}

const setPageContentToAzure = (location, content) => {
  const containerClient = getAzureContainerClient();
  const blockBlobClient = getAzureBlobClient(containerClient, location);

  return blockBlobClient.upload(content, content.length);
};

Also it might be useful for the SDK to maintain some cache of clients when getXxxxClient() is called. /cc @bterlson

jeremymeng on 19 Dec 2019

👍3

Also it might be useful for the SDK to maintain some cache of clients

Or we could make our clients to share a same http client.

jeremymeng on 19 Dec 2019

Another workaround: since we allow passing in an Http client when creating the client, you can do the following

const { DefaultHttpClient } = require("@azure/core-http");
const _httpClient = new DefaultHttpClient();

const getContainerClient = (account, key, container) => {
  console.log("creating a new BlobServiceClient to get a container client")
  const sharedKeyCredential = new StorageSharedKeyCredential(account, key);
  const blobServiceClient = new BlobServiceClient(
    `https://${account}.blob.core.windows.net`,
    sharedKeyCredential,
    { httpClient: _httpClient, keepAliveOptions: { enable: true } },
  );

  return blobServiceClient.getContainerClient(container);
};

jeremymeng on 20 Dec 2019

6657 addressed this issue by sharing a http client by default for all clients. It will be available in next release of blob storage packages.

jeremymeng on 3 Jan 2020

🎉1

@azure/[email protected] has been released. I am closing this issue. @js-kyle @OP-Klaus please let us know if you are still seeing other issues.

jeremymeng on 10 Jan 2020

Just to round this out for people viewing this issue in the future, the timeout problem we were experiencing was a problem with a server in a data center , which we have resolved by migrating.

Thanks for the work to optimise the client caching, and for all the attention this issue received. The support we have received has assured me that this library was a good choice for us.

OP-Klaus on 21 Jan 2020

❤5

Was this page helpful?

0 / 5 - 0 ratings