Node: async_hooks experimental status

Created on 9 Aug 2017 · 18Comments · Source: nodejs/node

We are looking at using async_hooks to maintain state in our instrumentations when async/await is used. What is your recommendation for using async_hooks in production code at this point?

We realize that it is marked as experimental API. However, it is not gated with a flag. Is the API likely to change? When do you think it will come out of experimental status?

async_hooks question

Source

martinkuba

All 18 comments

cc @nodejs/async_hooks

tniessen on 9 Aug 2017

@martinkuba hello, a little bit of context:

The module isn't gated with a flag since the code that implements it is very pervasive, and essentially it's impossible to turn it off, so a flag would have been a "false flag".
Indeed the experimental demarcation is there to indicate that the API might change with no correlation to semver rules. But the work that has been done during the v8.x cycle tried to be semver-minor nonetheless (refactored APIs were back-shimmed for compatibility, e.g. https://github.com/nodejs/node/pull/13490).
The implementation itself is not considered buggy in anyway.
Some big players are betting on async_hooks for tracing and APM, but I don't know first hand of any big projects that use it in production, maybe the others can help?

My very subjective estimation is that the "user" API will not change much. Most of the work being done is on the "embedder" API i.e. code that wants to trigger the hooks, not code that wants to use the hooks and query the graph.

IMHO go for it, but I'm a bleeding edge kind of guy, so it is very dependant on your needs and organizational constraints.

refack on 9 Aug 2017

In my opinion: treat it as if it were behind a flag.

I can almost guarantee there are going to be API changes, although we will try to make them backward compatible in 8.x, but that is only until it becomes to inconvenient.

During 8.x we have already deprecated 6 function calls or something like that.

AndreasMadsen on 10 Aug 2017

@refack @AndreasMadsen Thanks for your replies, I really appreciate it.

Have there been any performance tests done - to determine overhead and also to look for memory leaks when running over long period of time? If yes, can you please point me to those tests?

We are running benchmark tests ourselves, and would like to compare with any existing results.

martinkuba on 12 Aug 2017

@martinkuba that's a good questions re:

https://github.com/nodejs/benchmarking
https://www.youtube.com/watch?v=r0tdN5gx5eU&t=26s
https://www.youtube.com/watch?v=F0rTqMezYVo
It is still an open question how should node core do benchmarking...

In this case I can only contribute anecdotal data-points. Personally I know of one project (small-medium) that uses async-hooks in a long running system in production, they are not doing rigorous benchmarking, but have not been noticing memory leaks or performance degradation. It's also a complicated data-point to analyse since they upgraded from 6.x specifically so they could use async-hooks...

I believe everyone would be happy to hear New Relic's results, if you could share them.
I'll post a Request For Comments from the general public.

P.S. https://twitter.com/refack/status/896460320079785987 - https://github.com/nodejs/node/issues/14794

refack on 12 Aug 2017

@mcollina asked about the plans for the experimental status. Let's have the discussion here.

What I would like to see done before moving out of experimental status:

Setting up the benchmark suite to be run with and without async_hooks and see that there is only a small measurable difference given our tolerance.
Deprecate runInAsyncIdScope as agreed in https://github.com/nodejs/node/issues/14328.
Finish discussion on setTriggerId in https://github.com/nodejs/node/issues/14238. _I think we need others opinion on this, the discussion between me and Trevor is not very productive._
Finish discussion on newUid, emitInit, emitBefore, emitAfter and emitDestroy in https://github.com/nodejs/node/issues/15572. _I still think these are unnecessary and prone to cause errors and invalid states._
Once the three API discussions have been completed, we should separate the internal API from the external API. Similar to what I proposed in https://github.com/AndreasMadsen/node/pull/1.
Opt-in async_hooks state validation. We had decent state validation before (https://github.com/nodejs/node/pull/14722 and https://github.com/nodejs/node/pull/14387) but it was always active which was very problematic. PR: https://github.com/nodejs/node/pull/15454

_edited: added new links to https://github.com/nodejs/node/issues/15572 and https://github.com/nodejs/node/pull/15454._
_edited: spelling and poor phrasing._

AndreasMadsen on 15 Sep 2017

👍1

@AndreasMadsen

Opt-in async_hooks state validation. We had decent state validation before (#14722 and #14387) but it was always active which was very problematic.

Oh, good idea. I was thinking of just leaving it for debug build, but adding a flag or env var is an excellent idea. You have any ideas about how it should be implemented?

And for future generations reading this: the only reason I removed some of the validation was because of the ecosystem breakage and not being able to find an alternative solution quickly enough. If the edge cases that caused those failures could be determined I'd like to eventually re-add all state validation.

trevnorris on 15 Sep 2017

👍1

@trevnorris Well, as I have mentioned a few times I think we should just check async_hook_fields [kTotals] > 0. I realize this doesn't ensure correctness if only executionAsyncId() or triggerAsyncId() is used, but it is so trivial to add some empty hooks and we can document that "hack". One could argue that async_hooks is useful with incorrect states, I honestly think it will only lead to confusion and fewer critical bug reports. If it turns out to be a problem, we could change the behavior when moving to a stable state.

AndreasMadsen on 16 Sep 2017

Just for context, AFAIK there hasn't been a critical async_hooks in about 2 months (I think #14387 was the latest), so the code is converging anyway.

refack on 16 Sep 2017

@refack That is likely a false positive, there hasn't been an issue because we removed/relaxed the asserts. That doesn't mean there is no async_hooks state errors. It just means we are not seeing them anymore.

AndreasMadsen on 16 Sep 2017

👍1

@AndreasMadsen we cannot ship a Node.js release with tight asserts by default. If we need them to exists, then let's put them behind a flag. In this way, we can test this in applications without causing problems for everybody else that might not even be using async_hooks.

mcollina on 18 Sep 2017

Fwiw, https://github.com/nodejs/node/issues/15448 is a pretty fresh crash coming from async_hooks, although there seems to be more going on in it.

addaleax on 18 Sep 2017

we cannot ship a Node.js release with tight asserts by default. If we need them to exists, then let's put them behind a flag. In this way, we can test this in applications without causing problems for everybody else that might not even be using async_hooks.

Everyone seems to misunderstand what I'm saying. I will try and prepare a pull request today.

AndreasMadsen on 18 Sep 2017

@trevnorris @mcollina https://github.com/nodejs/node/pull/15454 is what I have in mind.

AndreasMadsen on 18 Sep 2017

/cc @jasnell, see https://github.com/nodejs/node/issues/14717#issuecomment-329755307 for my stable-requirements list.

AndreasMadsen on 26 Sep 2017

👍1

Has anyone tried using async_hooks in an application that uses multiple threads?

V8 supports a kind of cooperative multithreading, provided you use v8::Locker and v8::Unlocker to lock the Isolate object correctly, but it appears there's just one async_ids_stack_ for the whole node::Environment, and it doesn't get unwound/archived/restored when switching between threads.

Here's an example of where this can go wrong: https://github.com/laverdet/node-fibers/issues/357. Note: although Fibers are not technically threads, they could be implemented as threads, and V8 treats them as if they were threads, so the analogy holds.

Essentially, if you push an async ID on one thread, then switch threads and push another ID, then switch back to the first thread and pop the first ID, you get the corruption error because async_ids_stack_ still has the ID from the second thread at the top of the stack.

How can we make the async_hooks API more aware of threads? Should each thread have its own stack of async IDs? Should the AsyncHooks object live in thread-local storage, perhaps?

benjamn on 22 Dec 2017

@benjamn Please open another issue, then we can discuss it there.

AndreasMadsen on 23 Dec 2017

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.