Node: RFC: seeking feedback on async_hooks performance in production

Created on 12 Aug 2017  路  20Comments  路  Source: nodejs/node

  • Version: v8.x - master
  • Platform: all
  • Subsystem: async_hooks


[email protected] saw the introduction of async_hooks the latest and greatest incarnation of core's tracing API (https://github.com/nodejs/node/pull/11883 - based on https://github.com/nodejs/node-eps/blob/master/006-asynchooks-api.md)
The @nodejs/async_hooks teams would love to hear feedback, especially regarding performance in production apps (CPU/memory usage), but also synthetics benchmarking results.
Any other feedback will also be welcome, as well as issues and PRs.

Thanks.

async_hooks

Most helpful comment

@vdeturckheim I work at NR (Martin actually just moved teams). We've had mostly positive results thus far. No real-world issues have come up and it seems the performance is generally better than our existing monkey-patch based instrumentation. Our customers who are testing it have responded positively without any complaints of performance degradation after turning it on.

In our worst-case scenario benchmark (a 300-link no-op promise chain), our async-hook instrumentation is about 5x faster than our older monkey-patch based instrumentation. A no-op async hook was about 1.5x faster than our async-hook instrumentation. No instrumentation at all was about 8x faster than the async-hook instrumentation. This test is not reflective of real-world performance, but does show it is better than what we had and there is room for improvement.

For memory-leak situations, we've run some servers under extremely heavy load (~6 million promises per minute, 6000 requests per minute) for several days and noticed no leaking issues. We have, however, encountered a strange behavior around the 6-hour mark where GC scavenge events suddenly jump and CPU jumps with it. This is a one-time situation and the memory usage doesn't change after that. We're currently working on narrowing down the cause of that.

All 20 comments

@nodejs/diagnostics

I'm guessing the instructions for a non-synthetic benchmark would be:

  • Run an actual app on node 8 in prod.
  • Turn on async_hooks (require('async_hooks').enable() should be enough?).
  • Compare memory/CPU stats.

If so, I might get some data in the coming weeks (wanted to give 8.3 a spin anyhow in one of our services).

@jkrems You actually need to add empty hooks functions, otherwise nothing changes.

@jkrems I that will be great!

@jkrems You actually need to add empty hooks functions, otherwise nothing changes.

It will be a fair exercise to suss out memory leaks.

There is the example trace snippet from nodejs.org/api/async_hooks.html (slightly simplified):

let indent = '';
const traceFile = 1; // for stdout
async_hooks.createHook({
  init(asyncId, type, triggerAsyncId) {
    const eid = async_hooks.executionAsyncId();
    fs.writeSync(
      traceFile,
      `${indent}${type}(${asyncId}): trigger: ${triggerAsyncId} execution: ${eid}\n`);
  },
  before(asyncId) {
    fs.writeSync(traceFile , `${indent}before:  ${asyncId}\n`);
    indent += '  ';
  },
  after(asyncId) {
    indent = indent.slice(2);
    fs.writeSync(traceFile , `${indent}after:   ${asyncId}\n`);
  },
  destroy(asyncId) { fs.writeSync(traceFile , `${indent}destroy: ${asyncId}\n`); },
}).enable();

Been running it in production on several high traffic components since 4.2.1 and have had no major memory leaks. No performance degradation compared to the old async-listeners implementation.

One thing we did notice was the asynchronous hooks generated by promises didn't get "destroy"ed until the promise got garbage collection. This caused a little bit of memory run away under very high traffic (synthetic traffic) but haven't noticed any problems with real world traffic.

Hello,

I believe https://github.com/angular/zone.js/issues/889 might interest you!

I'm under the impression Google started using is in production in their APM. https://github.com/GoogleCloudPlatform/cloud-trace-nodejs/commit/0e15b6c95dd019787540aa31c9c889028abc708c

We are currently experimenting with async hooks but it is still behind a flag while we evaluate it. We found it to be slightly slower than the continuation-local-storage based approach but we still think it will be the right way forward especially once more modules start using the embedder API to record their own async events removing the need for some of our monkeypatching. We can update this thread once we gather more info on how async hooks is working for us.

awesome ! Thanks @matthewloring . When running with the CLS, is the monkeypatching on all async methods still applied ? Might have a performance boost there.

The monkeypatching done by our agent is the same under both approaches (since other modules have not yet started using the async hooks embedder api). Async hooks tracks more lifecycle events per asynchronous task than CLS does, especially for promises, many of which are redundant in the case of the context tracking done by our agent. I suspect this to be part of the slow down but we need to do more performance analysis to verify.

Is it even possible to implement a robust cls/zone (or a lower-level abstraction that would allow zone/cls to be built upon) as a native Node module ?

I'm under the impression that some features are very hard / impossible to implement in a robust way using only JS (event using monkey-patching), most related to hacks in third-party libraries that also attemps to do low-level async stuff, and implement some sort of userland scheduling, eg. bluebird.

What are your use cases for this family of features?
I'm most interested in writing async servers in multi-process Node. I want to build a common generator-based userland coroutines/procs libraries that abstracts away the transport layer between coroutines and allow dynamic scheduling/monitoring/hot update.
In this model a proc in a pausable/resumable state machine capable of sending/receiving messages to/from other procs with an inbox queue. (heavily inspired by Erlang)

Example[1] :

function* counterServer({ counter }) {
  yield effect("saveCounter", counter);
  try {
    const message = yield receive(10000); // pause until a message is received or a timeout is elapsed
    if(message.type === "add") {
      // "loop" by recursively yielding itself (or another generator function)
       return yield* counterServer({ counter: counter + 1 });
    }
    if(message.type === "query") {
      // send a message back to the sender
       yield send(message.source, { type: counter, counter });
       return yield* counterServer({ counter });
    }
  }
  catch(error) {
    if(error instanceof ReceiveTimeout) {
      // its okay to timeout, we can just keep receiving
      return yield* counterServer({ counter });
    }
    // other errors crash the process so that its monitor can restart it in a safe state
    console.error("Error thrown in counterServer");
    throw error;
  }
}

const loop = async (initialCounter) => {
  let savedCounter = initialCounter;
  try {
    await spawn(counterServer, { counter }, {
      // algebraic effect handler : produce a "global" side effect (bubbled and resumable)
      saveCounter: (currentCounter) => savedCounter = currentCounter,
    });
  }
  catch(error) {
    console.error(error);
    // restart in safe state
    return await loop(savedCounter);
  }
}

loop(0);

For this feature to be exploitable, I need to monitor asynchronous side-effects, so that calculations can't escape their context (unless they really really want to, in which case the invariants would not be supported). If zone or cls were reliable at scale, I could build this upon them.

For this I think I only need to ensure that I can keep a readable context, and that I can run a function in an extended context.

Example[1] (could be expressed in terms of zones/cls) :

Context.current.fork((context) => {
    // extend context passed by parent
    return { ... context, foo: "bar" };
  }), (context) => {
  // do stuff in new "hardly" escapable context (not strictly impossible but hard to do by mistake)
  ...
}).catch((error) => ...).then((result) => ... );

I think allowing "just" that would enable many useful paradigms, even with a more restricted feature set than Zones/cls.

Do you think this can be implemented using async_hooks only ? I have a prototype but I'm sure you guys spent a lot of time thinking about this and I'd be glad to read your thoughts.

@elierotenberg there is a plan to re-implement domains over async_hooks, you might want to keep an eye on this.

I will. Thanks.

@vdeturckheim I work at NR (Martin actually just moved teams). We've had mostly positive results thus far. No real-world issues have come up and it seems the performance is generally better than our existing monkey-patch based instrumentation. Our customers who are testing it have responded positively without any complaints of performance degradation after turning it on.

In our worst-case scenario benchmark (a 300-link no-op promise chain), our async-hook instrumentation is about 5x faster than our older monkey-patch based instrumentation. A no-op async hook was about 1.5x faster than our async-hook instrumentation. No instrumentation at all was about 8x faster than the async-hook instrumentation. This test is not reflective of real-world performance, but does show it is better than what we had and there is room for improvement.

For memory-leak situations, we've run some servers under extremely heavy load (~6 million promises per minute, 6000 requests per minute) for several days and noticed no leaking issues. We have, however, encountered a strange behavior around the 6-hour mark where GC scavenge events suddenly jump and CPU jumps with it. This is a one-time situation and the memory usage doesn't change after that. We're currently working on narrowing down the cause of that.

Glad to learn that NR is working on a visulization/debug tool based on async_hooks. Async work monitoring/debugging seems very painful and hard to me, and I'm convinced we need better tools with the proper abstractions.

As a side note, I've continued my little research and am now working on implementing a (minimal) actor system for easier work distribution with a single Node process or spanning multiple Node process (it should even be able to support having actors actually run in the browser and communication with the cluster using http/websockets). Think Erlang or Akka (but with much less features initially of course).
I will probably use async_hook in each Node process to monitor local async behavior, and enforce/monitor the invariants underlying the actor model implementation.

@NatalieWolfe thanks a lot! looks pretty promising!

Out of curiosity, what applications are you testing the agent with? Did you build an internal app or do you rely on open source projects?

Realized my comment might not be germane for this thread, so I moved it here: https://github.com/nodejs/node/issues/14717#issuecomment-353635036

Closing as discussion seems to have run its course. Feel free to re-open if this is still something that needs more discussion.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danialkhansari picture danialkhansari  路  3Comments

mcollina picture mcollina  路  3Comments

filipesilvaa picture filipesilvaa  路  3Comments

cong88 picture cong88  路  3Comments

seishun picture seishun  路  3Comments