Node: Integrate AsyncLocalStorage with EventEmitter

Created on 4 Jun 2020  ·  21Comments  ·  Source: nodejs/node

Is your feature request related to a problem? Please describe.

I'm going to port this library to AsyncLocalStorage. As a part of this migration I need to make ALS play nicely with events emitted by request and response objects. Current version of the library uses ns.bindEmitter() method available in cls-hooked (it's also present in continuation-local-storage).

Describe the solution you'd like

It's enough to add .bindEmitter(emitter, store) method into ALS:

const store = { foo: 'bar' };
als.bindEmitter(req, store);

req.on('close', () => {
  console.log(als.getStore()); // prints "{ foo: 'bar' }"
});

Describe alternatives you've considered

Of course, this integration can be implemented as a user-land module. But I believe that this will be a common case among ALS users, so it's probably worth to consider integrating ALS with EventEmitter in the core.

As more of such changes can bloat ALS API, I'd like to hear opinions from @nodejs/async_hooks before starting to work on the implementation.

Update. The consensus is to add something like AsyncResource#bind() method that would allow to do the following:

als.run({ foo: 'bar' }, () => {
  AsyncResource res = new AsyncResource('foobar');
  req.on('close', res.bind(() => {
    console.log(als.getStore()); // prints "{ foo: 'bar' }"
  }));
});
async_hooks

Most helpful comment

Just copying in @jasnell's comment from https://github.com/nodejs/node/issues/33749. I feel it might be relevant here:

EventEmitter is going to be a tricky one if only because of how performance sensitive it is. It absolutely makes sense for an EventEmitter to be tracked as an async resource but when I went through the exercise of making it one it ended up 2x-3x slower in regular use scenarios (streams, http servers, etc). I'm 100% in favor of doing something here but the performance loss problem absolutely needs to be addressed.

While simultaneously highlighting @addaleax's comment from earlier in this thread:

I think all of these options would come with some perf impact, but also mean that we need to do less async tracking in the Node.js C++ internals overall, because we can let the wrapper objects (which often are EventEmitters) take care of that in a lot of cases. That might even lead to a net positive impact as far as Node.js-provided resources are concerned. I also think @Qard would like that move away from C++-based tracking, based on what I’ve heard him say 😉

All 21 comments

The goal of the method is to set the context on an _existing_ emitter?

I'm a bit confused about why this is needed for the server case, wouldn't the request handler itself already run in the context?

Edit: closed (and reopened) due to butterfingers.

@benjamingr

The goal of the method is to set the context on an _existing_ emitter?

Yes, that's right.

I'm a bit confused about why this is needed for the server case, wouldn't the request handler itself already run in the context?

The handler itself is run in the context, but some of events emitted for req/res may be run in a different context. Here is a simple test that illustrates the problem:

'use strict';
require('../common');
const assert = require('assert');
const { AsyncLocalStorage } = require('async_hooks');
const http = require('http');

const asyncLocalStorage = new AsyncLocalStorage();
const server = http.createServer((req, res) => {
  asyncLocalStorage.run(42, () => {
    req.on('close', () => {
      assert.strictEqual(asyncLocalStorage.getStore(), 42); // fails
    });
    assert.strictEqual(asyncLocalStorage.getStore(), 42);
    res.end('ok');
  });
});

server.listen(0, () => {
  http.get({ host: 'localhost', port: server.address().port }, () => {
    server.close();
  });
});

Edit: closed (and reopened) due to butterfingers.

No problems at all. Happens with everyone.

@puzpuzpuz Have you investigated where the context gets lost here?
It could be a bug somewhere in http implementation that wrong asyncIds are used somewhere resulting in new roots.

Besides the HTTP example I agree that this seems helpful - in special as there are a lot user space modules out there using emitters but don't care that much about async context used.

I'm not sure if binding a complete emitter is the right solution. Maybe this should at least allow to limit it to some events.

Wouldn't it make more sense to expose something that allows the user to bind any function / object method instead of specifically EventEmitter#emit ?

@puzpuzpuz Have you investigated where the context gets lost here?
It could be a bug somewhere in http implementation that wrong asyncIds are used somewhere resulting in new roots.

I don't think the same context for req/res's event listeners as for the request listener was even guaranteed, was it?

Some of them are emitted as a part of socket.on() listeners. See this one function as an example:
https://github.com/nodejs/node/blob/32b641e528210560d9cb4b524acf844f74d0d266/lib/_http_server.js#L546

I'm not sure if binding a complete emitter is the right solution. Maybe this should at least allow to limit it to some events.

An optional filter argument for events could probably help.

Wouldn't it make more sense to expose something that allows the user to bind any function / object method instead of specifically EventEmitter#emit ?

Probably, yes. Something similar to ns.bind() from cls-hooked might be even better than the .bindEmitter() method. With this approach user code would look like this one:

const store = { foo: 'bar' };
als.bind(req.emit, store);

req.on('close', () => {
  console.log(als.getStore()); // prints "{ foo: 'bar' }"
});

I don't think the same context for req/res's event listeners as for the request listener was even guaranteed, was it?

Not the same context but I would assume that context passing done via async hooks is able to link them similar as link via e.g. setTimeout works.
But sure, if events emitted from the net socket are just forwarded it's mostly clear that context is lost/wrong as socket doesn't know the current http request and may be used for several requests. It would be needed to handle the event in http first and reemit with correct context. More or less a userspace queuing problem.

So … trying to sort my thoughts here, and I’ll apologize for the long comment in advance.

Firstly, the idea to bind an existing EventEmitter to a specific ALS store seems odd, especially considering that an EventEmitter can have multiple consumers and those other consumers might not want that binding to a specific ALS store.

Secondly, unless I’m misunderstanding something, it feels like ALS is not the right level to approach this, because the problem is that the async tracking does not work as expected. In particular, in the example in https://github.com/nodejs/node/issues/33723#issuecomment-638750040, the problem is that .on() is basically creating a new async operation, but it is not registered as such.

Thirdly, we bascially kind of ran into this kind of problem with domain as well. The solution for domain was (is) to basically monkey-patch EventEmitters so that they behave like AsyncResources where all events were emitted in the EE’s async scope.

That makes it seem to me like we want to do any of:

  1. Do the domain-like thing, and make EEs behave like AsyncResources. That would not solve the issue in the example above, though.
  2. Do a variant of that, which would be equivalent to making EventEmitter.on() create a new AsyncResource and have the corresponding .emit() call run the callback in that AsyncResource’s scope.

As an alternative, or additionally, we could also:

  1. Provide a way to bind the callback to the async scope in which it was created, i.e. req.on('close', async_hooks.bindToCurrentScope(() => { ... })). This could implicitly create an AsyncResource and run the callback in its scope whenever it is called. (Option 2 would basically mean doing this implicitly for ee.on(...).)

If I were to put myself in a position where I didn’t know a lot about Node.js internals, I think Option 2 is something that would make sense to be, because I would expect

process.nextTick(() => doSomething());

and

ee.on('event', () => doSomething());

to behave similarly – i.e. to run the callback in an async context that corresponds to where it was first registered.

I think all of these options would come with some perf impact, but also mean that we need to do less async tracking in the Node.js C++ internals overall, because we can let the wrapper objects (which often are EventEmitters) take care of that in a lot of cases. That might even lead to a net positive impact as far as Node.js-provided resources are concerned. I also think @qard would like that move away from C++-based tracking, based on what I’ve heard him say :wink:

Then, finally, I just had a short conversation with @ronag about how this would work in streams, and basically, the conclusion that I came to was that it would make sense for streams to also use custom AsyncResources to make sure that events coming from the stream are emitted in the correct async context – for example, .destroy() can be called from anywhere and subsequently lead to a 'close' event, which would mean that a .on('close') listener can basically run in any kind of async context that has no relation to when that listener was added. That seams like a special case of the situation for EventEmitters, though.

@addaleax Yep, idea 3 is basically what I’ve had in mind for awhile. I’m currently working on making a callback trampoline for InternalMakeCallback which I plan on being able to reuse for this sort of scenario to wrap JavaScript-side callbacks and emit the related events as much as possible purely on the JS-side.

By avoiding the barrier cross performance hit we should be able to make the cost of tracking additional relevant barriers like event emitters and streams less significant by relying on V8 to inline what it needs.

I'm also a fan of option 3.

@addaleax

So … trying to sort my thoughts here, and I’ll apologize for the long comment in advance.

No need to apologize. Your comment is really valuable. 👍

Firstly, the idea to bind an existing EventEmitter to a specific ALS store seems odd, especially considering that an EventEmitter can have multiple consumers and those other consumers might not want that binding to a specific ALS store.

With the default behavior of EE consumers (listeners) of the same event will be executed within the same context, as the execution happen synchronously when .emit() is called. So, multiple consumers (of the same event, to be precise) will get the same store of a certain ALS instance. The proposed methods change nothing here.

BTW if one of listeners needs a separate store, it can be wrapped with als.run() (or proposed als.bind()).

Secondly, unless I’m misunderstanding something, it feels like ALS is not the right level to approach this, because the problem is that the async tracking does not work as expected. In particular, in the example in #33723 (comment), the problem is that .on() is basically creating a new async operation, but it is not registered as such.

EE itself has nothing to do with async resources, so I'm not sure where it's correct to say ".on() is basically creating a new async operation". The exact async context (resource) depends on each .emit() invocation and can even change between them.

That's why I'm not sure what should be the right level to approach this. But certainly EE API itself is not a problem and shouldn't be changed. But we may consider changing behavior of certain EEs returned by core APIs, like req/res, as users may have false expectations of their integration with async_hooks.

That makes it seem to me like we want to do any of:

  1. Do the domain-like thing, and make EEs behave like AsyncResources. That would not solve the issue in the example above, though.

Yes, that won't solve the discussed issue.

  1. Do a variant of that, which would be equivalent to making EventEmitter.on() create a new AsyncResource and have the corresponding .emit() call run the callback in that AsyncResource’s scope.

That could work if we properly integrate (say, by monkey-patching them) certain EEs used in the core with async_hooks. However, it's probably easier to change behavior of all EEs. In any case that has to be properly documented, so that users know what to expect.

As an alternative, or additionally, we could also:

  1. Provide a way to bind _the callback_ to the async scope in which it was created, i.e. req.on('close', async_hooks.bindToCurrentScope(() => { ... })). This could implicitly create an AsyncResource and run the callback in its scope whenever it is called. (Option 2 would basically mean doing this implicitly for ee.on(...).)

That won't work in my case (and probably in other users's cases). I'm not in control of user code and other middlewares, so I can't force them to wrap EE listeners with something like async_hooks.bindToCurrentScope() function.

If I were to put myself in a position where I didn’t know a lot about Node.js internals, I think Option 2 is something that would make sense to be, because I would expect

process.nextTick(() => doSomething());

and

ee.on('event', () => doSomething());

to behave similarly – i.e. to run the callback in an async context that corresponds to where it was first registered.

I would have such expectations as EEs themselves do not imply async operations (resources), only CPS. But, once again, some of EEs (or all of them?) used in core APIs may be treated in a special way.

That seams like a special case of the situation for EventEmitters, though.

Yes, streams (certain streams used in core APIs, of course) are close to EEs.

@addaleax what would you say of .bind() method discussed in https://github.com/nodejs/node/issues/33723#issuecomment-638826832 and https://github.com/nodejs/node/issues/33723#issuecomment-638830875? It's not coupled with EEs and provides some flexibility for users. Certainly, that's not a replacement of changes in EEs/streams behavior discussed above, but they're rather orthogonal.

BTW .bind() could be added to AsyncResource instead of ALS. That would make it even more flexible.

Such API is close to async_hooks.bindToCurrentScope() function mentioned above, but it will give the user more control over the context. Thus, it'll be able to solve the original problem.

I think the _correct_ place for this is in AsyncResource. The functionality is needed at every level of async correlation.

Many thanks for your inputs! I'm going to submit a PR for AsyncResource#bind(). Let's continue the conversation over there.

Here it goes: #33736

Closing this one as it can be implemented via language features and the consensus is to avoid bloating the API.

@puzpuzpuz I’ll reopen this since I’d still like to discuss whether we want to implement some form of built-in async tracking for EventEmitters in general, i.e. option 1 or 2 from above.

Just copying in @jasnell's comment from https://github.com/nodejs/node/issues/33749. I feel it might be relevant here:

EventEmitter is going to be a tricky one if only because of how performance sensitive it is. It absolutely makes sense for an EventEmitter to be tracked as an async resource but when I went through the exercise of making it one it ended up 2x-3x slower in regular use scenarios (streams, http servers, etc). I'm 100% in favor of doing something here but the performance loss problem absolutely needs to be addressed.

While simultaneously highlighting @addaleax's comment from earlier in this thread:

I think all of these options would come with some perf impact, but also mean that we need to do less async tracking in the Node.js C++ internals overall, because we can let the wrapper objects (which often are EventEmitters) take care of that in a lot of cases. That might even lead to a net positive impact as far as Node.js-provided resources are concerned. I also think @Qard would like that move away from C++-based tracking, based on what I’ve heard him say 😉

Just copying in @jasnell's comment from #33749. I feel it might be relevant here:

EventEmitter is going to be a tricky one if only because of how performance sensitive it is. It absolutely makes sense for an EventEmitter to be tracked as an async resource but when I went through the exercise of making it one it ended up 2x-3x slower in regular use scenarios (streams, http servers, etc).

@jasnell I wonder what was the approach? Did you try to implicitly create an AsyncResource on eventEmitter.on() call and then wrap the listener with runInAsyncScope() or it was something more complicated? After #32429 AsyncResources don't involve GC-tracking anymore when there are no destroy listeners and that could decrease the overhead, but I'm not sure about the exact improvement.

My apologies for missing the earlier notification from a month ago! Sigh...

Yes, the approach was to create an AsyncResource on eventEmitter.on(), in a generalized way that would enable it for all event emitter instances.

At this time, I'm heavily leaning towards @addaleax's option 3 here.

Was this page helpful?
0 / 5 - 0 ratings