Node: AsyncLocalStorage kills 97% of performance in an `async` environment

Created on 23 Jul 2020 · 10Comments · Source: nodejs/node

Version: 12.18.3
Platform: macos
Subsystem: async_hooks

What steps will reproduce the bug?

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();

let fn = async () => /test/.test('test');

let runWithExpiry = async (expiry, fn) => {
    let iterations = 0;
    while (Date.now() < expiry) {
        await fn();
        iterations++;
    }
    return iterations;
};

(async () => {

    console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

    asyncLocalStorage.run({}, () => {});
    let withAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withAls} iterations (with ALS enabled)`);

    asyncLocalStorage.disable();
    let withoutAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

    console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

Output:

Performed 180407 iterations to warmup
Performed 205741 iterations (with ALS enabled)
Performed 6446728 iterations (with ALS disabled)
ALS penalty: 96.8%

How often does it reproduce? Is there a required condition?

In any case that await is used.

What is the expected behavior?

Should be around 10% penalty.

What do you see instead?

I see 97% reduction in performance.

Additional information

I've played in the past with a Zone polyfill of my own (zone-polyfill), and at the beginning I tried using the stack traces that's generated in latest V8 versions, where they keep their context after await calls.
Combining that with the basic technique to track context, I was able to track the zone but due to the string nature of the stack traces - I had to map them in the memory but there was no WeakMap available. Using a NAN solution I got to about 15% penalty.
Now assuming that these capabilities are available on the c++ side and much more natively, with the option to directly leverage the compiler instead of generating stack traces to query for info - I'd assume max 10% penalty with a native implementation.

async_hooks performance

Source

danielgindi

Most helpful comment

So you're saying the numbers I've posted are not bad enough?

They are bad. However it is very rare that await is the bottleneck of any application. In order for await to be the bottleneck, what being await would have to be synchronous or fake-async.

By using a setTimeout of 2ms:

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();
const { promisify } = require('util')

let fn = promisify(setTimeout).bind(null, 2);

let runWithExpiry = async (expiry, fn) => {
      let iterations = 0;
      while (Date.now() < expiry) {
                await fn();
                iterations++;
            }
      return iterations;
};

(async () => {

      console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

      asyncLocalStorage.run({}, () => {});
      let withAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withAls} iterations (with ALS enabled)`);

      asyncLocalStorage.disable();
      let withoutAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

      console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

We get:

$ node a
Performed 44 iterations to warmup
Performed 4214 iterations (with ALS enabled)
Performed 4400 iterations (with ALS disabled)
ALS penalty: 4.23%

This is totally acceptable and well within range.

mcollina on 31 Jul 2020

👍5

All 10 comments

I've tried to run this test on the latest master and got the following result:

Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)

So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}

and got the following result:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877

As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

puzpuzpuz on 28 Jul 2020

I've tried to run this test on the latest master and got the following result:
Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)
So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:
static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}
and got the following result:
$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877
As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

Well, only slightly better. with ALS enabled is 8.9% of with ALS disabled, which means that 91% is lost, and not 97% (which was the consistent number I got on all machines).

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

danielgindi on 28 Jul 2020

Well, only slightly better.

Yes and that aligns with the benchmark results I got for #34523. There is still a lot to wish for in terms of the PromiseHook overhead. However, in real-world applications the overhead will be much smaller.

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

That's what PromiseHook used by async_hooks allows you to do. Yet, it assumes native code.

puzpuzpuz on 28 Jul 2020

I'd consider this is a somewhat unrealistic micro-benchmark as it's awaiting a non-promise in a loop which is naturally going to have a lot more machinery in the barrier marking and tracking then in the non-async function call it is giving to the await.

I did a few runs locally, replacing the function with some more realistic variations:

let fn = () => /test/.test('test'); - the original code, for baseline comparison:

Performed 744791 iterations to warmup
Performed 790158 iterations (with ALS enabled)
Performed 7946914 iterations (with ALS disabled)

let fn = () => Promise.resolve('test'); - at least creates an actual promise, though no actual asynchronous behaviour beyond deferring to the next microtask queue tick:

Performed 873666 iterations to warmup
Performed 962120 iterations (with ALS enabled)
Performed 8148736 iterations (with ALS disabled)

let fn = () => new Promise(setImmediate); - somewhat more realistic representation of a promise you'd actually typically see in reality:

Performed 5477 iterations to warmup
Performed 56758 iterations (with ALS enabled)
Performed 62029 iterations (with ALS disabled)

let fn = () => new Promise(process.nextTick); - reasonably realistic representation of higher-priority promises you might see in reality:

Performed 158118 iterations to warmup
Performed 582238 iterations (with ALS enabled)
Performed 2683626 iterations (with ALS disabled)

Most promises would be I/O related and would sit somewhere between the last two tests.

It's fair to want the performance to be better, but the path your code tested is actually _quite different_, according to the spec, from what an actual typical promise-based await would look like. I don't think it's too helpful to rely on micro-benchmarks, especially with a system this complex. I _do_ think there may be optimization potential with AsyncLocalStorage to use the underlying machinery of async_hooks directly in a way that can avoid the numerous barrier crosses of the hook callback functions, though that would be a fairly complex change. 🤔

Qard on 31 Jul 2020

👍2

I agree that micro-benchmarks do not present the whole picture.
But it does present the performance difference of this specific feature, and how it will affect your code if you have a lot of awaits/Promises. And if you do have a loop over little-work async code somewhere in your app, something that happens a lot of time, you will be surprised to see a major hit in performance.

So if the hit in the micro benchmark was 20%, that would be acceptable. But that's not the case.

danielgindi on 31 Jul 2020

As I said, what you were testing was _not_ async. The performance difference would almost _never_ look like that in reality because an _actual_ async promise has a bunch more overhead than the non-async test function you awaited in your test code.

Qard on 31 Jul 2020

So you're saying the numbers I've posted are not bad enough?
And this is intentional, as I wanted to measure the impact on the await.

danielgindi on 31 Jul 2020

@Qard thanks for these experiments. I also agree that ALS is meant to be used in real-world applications and it makes sense to benchmark it in a different way. Micro-benchmarks are certainly valuable for various optimizations, however web applications do more than what's done in the original snippet and ALS' overhead will be much lower for them.

@danielgindi however, this doesn't mean that async_hooks + promise integration doesn't need any performance improvements. But, as it was already discussed, all low-hanging fruit optimizations are already implemented. So, it may take some time and significant efforts to come up with more advanced optimizations.

puzpuzpuz on 31 Jul 2020

So you're saying the numbers I've posted are not bad enough?

They are bad. However it is very rare that await is the bottleneck of any application. In order for await to be the bottleneck, what being await would have to be synchronous or fake-async.

By using a setTimeout of 2ms:

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();
const { promisify } = require('util')

let fn = promisify(setTimeout).bind(null, 2);

let runWithExpiry = async (expiry, fn) => {
      let iterations = 0;
      while (Date.now() < expiry) {
                await fn();
                iterations++;
            }
      return iterations;
};

(async () => {

      console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

      asyncLocalStorage.run({}, () => {});
      let withAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withAls} iterations (with ALS enabled)`);

      asyncLocalStorage.disable();
      let withoutAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

      console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

We get:

$ node a
Performed 44 iterations to warmup
Performed 4214 iterations (with ALS enabled)
Performed 4400 iterations (with ALS disabled)
ALS penalty: 4.23%

This is totally acceptable and well within range.

mcollina on 31 Jul 2020

👍5

@mcollina I get that. Although some async operations may take less than 2ms, still being async and taking advantage of libuv threads.

I'm going to do a wet test today on a large scale application that has all imaginable scenarios implemented, and that should service hundreds of millions of users. I'll let you guys know how it goes :-)

danielgindi on 2 Aug 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings