Node: stream performance regression 12.6.0 vs 10.16.0

Created on 7 Jul 2019  Â·  16Comments  Â·  Source: nodejs/node

Seeing major performance regression, ~50% drop in throughput, in one of our benchmarks. I will work on creating a minimal repro. In the meantime though, here are some snippets of the profile outputs - it is pretty strange that 12.6.0 is spending ~70% of the time in shared library land (maybe there's an issue with the profile too?)

#uname -a
Linux ledion-vm 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

10.16.0

 [Shared libraries]:
   ticks  total  nonlib   name
   1021    2.5%          /lib/x86_64-linux-gnu/libc-2.23.so
    724    1.8%          /lib/x86_64-linux-gnu/libpthread-2.23.so
     17    0.0%          [vdso]
      6    0.0%          /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21


 [Summary]:
   ticks  total  nonlib   name
  26260   63.9%   66.8%  JavaScript
  11860   28.9%   30.2%  C++
    947    2.3%    2.4%  GC
   1768    4.3%          Shared libraries
   1178    2.9%          Unaccounted


 [Bottom up (heavy) profile]:
   ticks parent  name
    3566    8.7%  Builtin: LoadIC
     858   24.1%    LazyCompile: *onwrite _stream_writable.js:450:17
     358   41.7%      LazyCompile: *afterTransform _stream_transform.js:78:24
     349   97.5%        LazyCompile: *Transform._read _stream_transform.js:185:37
     349  100.0%          LazyCompile: *Readable.read _stream_readable.js:377:35
     348   99.7%            LazyCompile: *pipeOnDrainFunctionResult _stream_reada

12.6.0

 [Shared libraries]:
   ticks  total  nonlib   name
  28413   64.4%          /usr/local/bin/node
   1845    4.2%          /lib/x86_64-linux-gnu/libc-2.23.so
    455    1.0%          /lib/x86_64-linux-gnu/libpthread-2.23.so
      7    0.0%          [vdso]
      6    0.0%          /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
      4    0.0%          /lib/x86_64-linux-gnu/libm-2.23.so

 [Summary]:
   ticks  total  nonlib   name
   8697   19.7%   65.1%  JavaScript
   4515   10.2%   33.8%  C++
   1969    4.5%   14.7%  GC
  30730   69.7%          Shared libraries
    154    0.3%          Unaccounted


   ticks parent  name
  28413   64.4%  /usr/local/bin/node
   6958   24.5%    LazyCompile: *Readable.read _stream_readable.js:394:35
   6901   99.2%      LazyCompile: *pipeOnDrainFunctionResult _stream_readable.js
   6901  100.0%        LazyCompile: *emit events.js:153:44
   6901  100.0%          LazyCompile: *EventEmitter.emit domain.js:460:39
   6901  100.0%            LazyCompile: *onwrite _stream_writable.js:445:17
   3237   11.4%    LazyCompile: *ondata _stream_readable.js:711:18
   3237  100.0%      LazyCompile: *emit events.js:153:44
   3237  100.0%        LazyCompile: *Readable.read _stream_readable.js:394:35
   3214   99.3%          LazyCompile: *pipeOnDrainFunctionResult _stream_readabl
   3214  100.0%            LazyCompile: *emit events.js:153:44
confirmed-bug performance stream

Most helpful comment

I believe I've isolated the problem to streams and 12.0.0 seems to be first time when this appeared. See benchmark bench-stream.txt

Node v12.6.0
read to write x 6,804 ops/sec ±2.27% (74 runs sampled)
read to passthru to write x 3,778 ops/sec ±24.92% (76 runs sampled)

Node v12.0.0
read to write x 6,514 ops/sec ±2.36% (75 runs sampled)
read to passthru to write x 3,664 ops/sec ±27.53% (73 runs sampled)

Node v11.15.0
read to write x 8,941 ops/sec ±2.28% (75 runs sampled)
read to passthru to write x 5,870 ops/sec ±2.60% (78 runs sampled)

Node v11.0.0
read to write x 9,356 ops/sec ±2.87% (69 runs sampled)
read to passthru to write x 6,338 ops/sec ±2.36% (73 runs sampled)

Node v10.16.0
read to write x 9,621 ops/sec ±2.19% (74 runs sampled)
read to passthru to write x 6,610 ops/sec ±2.27% (76 runs sampled)

All 16 comments

I believe I've isolated the problem to streams and 12.0.0 seems to be first time when this appeared. See benchmark bench-stream.txt

Node v12.6.0
read to write x 6,804 ops/sec ±2.27% (74 runs sampled)
read to passthru to write x 3,778 ops/sec ±24.92% (76 runs sampled)

Node v12.0.0
read to write x 6,514 ops/sec ±2.36% (75 runs sampled)
read to passthru to write x 3,664 ops/sec ±27.53% (73 runs sampled)

Node v11.15.0
read to write x 8,941 ops/sec ±2.28% (75 runs sampled)
read to passthru to write x 5,870 ops/sec ±2.60% (78 runs sampled)

Node v11.0.0
read to write x 9,356 ops/sec ±2.87% (69 runs sampled)
read to passthru to write x 6,338 ops/sec ±2.36% (73 runs sampled)

Node v10.16.0
read to write x 9,621 ops/sec ±2.19% (74 runs sampled)
read to passthru to write x 6,610 ops/sec ±2.27% (76 runs sampled)

Any ideas where to look for a potential root cause here? Happy to dig thru

/ping @nodejs/streams

Would you mind adding a reproducible benchmark? It seems you are interactive with a native resource / the change might actually be there or due to somewhere else entirely (a change in v8 for example).

Check out https://clinicjs.org/flame/ to help you debug these (there is also an upload function in clinic, it will give you a link that you can paste here for example).

@mcollina happy to look at Clinic.js, seems pretty cool! In the meantime, I did include a benchmark above, not sure if you saw it or not (when saying that "you are interacting with a native resource") - the benchmark below uses just Node streams

const Benchmark = require('benchmark')
const { Writable, Readable, PassThrough } = require('stream'); 
const suite = new Benchmark.Suite;

function devnull() {
  return new Writable({
    objectMode: true,
    write: function (event, encoding, callback) {
      callback();
    }
  })
}

function readable() {
  let idx = 0, len = 100;
  return new Readable({
    read: function() {
      while(idx++ < len) {
        if(!this.push('foo' + idx))
          return;
      }
      this.push(null);
    }
  });
}
console.log(`Node ${process.version}`);
// add tests
suite
  .add('read to write', function(deferred) {
    readable()
      .pipe(devnull())
      .on('finish', () => deferred.resolve())
  }, {defer: true})
  .add('read to passthru to write', function(deferred) {
    readable()
      .pipe(new PassThrough())
      .pipe(devnull())
      .on('finish', () => deferred.resolve())
  }, {defer: true})

// add listeners
.on('cycle', function(event) {
  console.log(String(event.target));
})
// run async
.run({ 'async': true });

I can reproduce the performance drop.

The main difference between the two is that 12 spends a lot of time in __ZN2v88internalL47Builtin_Impl_Stats_CallSitePrototypeGetTypeNameEiPmPNS0_7IsolateE, which is a C++ in V8. @bmeurer wdyt? What could cause that?

I have a hunch that it can be related to https://github.com/nodejs/node/pull/27146.~~ No, it's not due to that.
I'll investigate more and report.

This regression was introduced when we updated to V8 7.3.492.25.
@nodejs/v8 would you like to chime in?

cc @hashseed @bmeurer @targos @ryzokuken

Please check out https://github.com/nodejs/node/pull/28842, it should solve the regression.

Should we replace occurrences of Object.getPrototypeOf with instanceof in other files?

Comment from @mcollina in https://github.com/nodejs/node/pull/28842#issue-300744529:

use instanceof instead of Object.getPrototypeOf, as checking an
object prototype with Object.getPrototypeOf is slower
than an instanceof check.

Comment from @targos in https://github.com/nodejs/node/pull/28842#issuecomment-514661497:

I would not add the comments about Object.getPrototypeOf because instanceof seems like the standard way to do the check to me.

Should we replace occurrences of Object.getPrototypeOf with instanceof in other files?

I think this only applies to the one in lib/zlib.js, besides streams. And there it may or may not be faster to test against not being an Uint8Array, because the comment in the zlib code is outdated (streams do support any typeof Uint8Array).

hmm @mcollina, I'm still seeing a significant perf drop on 12.10.0

```
workspace/bench/src# node stream.js
Node v10.16.0
read to write x 8,550 ops/sec ±5.17% (70 runs sampled)
read to passthru to write x 5,111 ops/sec ±24.00% (66 runs sampled)

workspace/bench/src# node stream.js
Node v12.10.0
read to write x 4,248 ops/sec ±34.48% (76 runs sampled)
read to passthru to write x 2,607 ops/sec ±20.69% (75 runs sampled)
```

It got worse in 12.10.0 vs 12.9.0

$ node bench.js
Node v12.6.0
read to write x 17,812 ops/sec ±2.07% (78 runs sampled)
read to passthru to write x 11,389 ops/sec ±1.55% (83 runs sampled)

$ node bench.js
Node v12.10.0
read to write x 13,615 ops/sec ±0.89% (84 runs sampled)
read to passthru to write x 7,589 ops/sec ±1.05% (86 runs sampled)

$ node bench
Node v12.9.0
read to write x 22,779 ops/sec ±2.65% (82 runs sampled)
read to passthru to write x 14,617 ops/sec ±1.14% (86 runs sampled)

@mcollina @mscdex I’ve investigated this. It looks like f4f88270e7b4acdbd8c5679f8f0c8d996456f3a8 (#25461) only works on V8 7.7+ – on 7.6, creating the object literal the way that that PR does involves multiple calls into the V8 runtime (v8::internal::Runtime_DefineDataPropertyInLiteral), whereas on 7.7 it doesn’t call into the runtime.

I think that means we either have to revert f4f88270e7b4acdbd8c5679f8f0c8d996456f3a8 on v12.x, or wait for the V8 update to 7.7 on v12.x (#29241), which should be ready next week.

I would revert it if we decide to do a release before V8 7.7 lands

This is solved again in 12.11.0:

$ node bench.js
Node v12.11.0
read to write x 28,242 ops/sec ±1.93% (82 runs sampled)
read to passthru to write x 17,695 ops/sec ±1.92% (87 runs sampled)

$ node bench.js
Node v12.10.0
read to write x 13,527 ops/sec ±0.89% (85 runs sampled)
read to passthru to write x 7,417 ops/sec ±1.48% (85 runs sampled)
~/temp/stream-bug

$ node bench.js
Node v12.9.0
read to write x 22,527 ops/sec ±2.30% (77 runs sampled)
read to passthru to write x 14,578 ops/sec ±0.99% (86 runs sampled)

$ node bench.js
Node v10.16.0
read to write x 19,750 ops/sec ±1.72% (85 runs sampled)
read to passthru to write x 13,717 ops/sec ±1.23% (84 runs sampled)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

seishun picture seishun  Â·  3Comments

cong88 picture cong88  Â·  3Comments

willnwhite picture willnwhite  Â·  3Comments

Icemic picture Icemic  Â·  3Comments

jmichae3 picture jmichae3  Â·  3Comments