Seeing major performance regression, ~50% drop in throughput, in one of our benchmarks. I will work on creating a minimal repro. In the meantime though, here are some snippets of the profile outputs - it is pretty strange that 12.6.0 is spending ~70% of the time in shared library land (maybe there's an issue with the profile too?)
#uname -a
Linux ledion-vm 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
10.16.0
[Shared libraries]:
ticks total nonlib name
1021 2.5% /lib/x86_64-linux-gnu/libc-2.23.so
724 1.8% /lib/x86_64-linux-gnu/libpthread-2.23.so
17 0.0% [vdso]
6 0.0% /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
[Summary]:
ticks total nonlib name
26260 63.9% 66.8% JavaScript
11860 28.9% 30.2% C++
947 2.3% 2.4% GC
1768 4.3% Shared libraries
1178 2.9% Unaccounted
[Bottom up (heavy) profile]:
ticks parent name
3566 8.7% Builtin: LoadIC
858 24.1% LazyCompile: *onwrite _stream_writable.js:450:17
358 41.7% LazyCompile: *afterTransform _stream_transform.js:78:24
349 97.5% LazyCompile: *Transform._read _stream_transform.js:185:37
349 100.0% LazyCompile: *Readable.read _stream_readable.js:377:35
348 99.7% LazyCompile: *pipeOnDrainFunctionResult _stream_reada
12.6.0
[Shared libraries]:
ticks total nonlib name
28413 64.4% /usr/local/bin/node
1845 4.2% /lib/x86_64-linux-gnu/libc-2.23.so
455 1.0% /lib/x86_64-linux-gnu/libpthread-2.23.so
7 0.0% [vdso]
6 0.0% /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
4 0.0% /lib/x86_64-linux-gnu/libm-2.23.so
[Summary]:
ticks total nonlib name
8697 19.7% 65.1% JavaScript
4515 10.2% 33.8% C++
1969 4.5% 14.7% GC
30730 69.7% Shared libraries
154 0.3% Unaccounted
ticks parent name
28413 64.4% /usr/local/bin/node
6958 24.5% LazyCompile: *Readable.read _stream_readable.js:394:35
6901 99.2% LazyCompile: *pipeOnDrainFunctionResult _stream_readable.js
6901 100.0% LazyCompile: *emit events.js:153:44
6901 100.0% LazyCompile: *EventEmitter.emit domain.js:460:39
6901 100.0% LazyCompile: *onwrite _stream_writable.js:445:17
3237 11.4% LazyCompile: *ondata _stream_readable.js:711:18
3237 100.0% LazyCompile: *emit events.js:153:44
3237 100.0% LazyCompile: *Readable.read _stream_readable.js:394:35
3214 99.3% LazyCompile: *pipeOnDrainFunctionResult _stream_readabl
3214 100.0% LazyCompile: *emit events.js:153:44
I believe I've isolated the problem to streams and 12.0.0
seems to be first time when this appeared. See benchmark bench-stream.txt
Node v12.6.0
read to write x 6,804 ops/sec ±2.27% (74 runs sampled)
read to passthru to write x 3,778 ops/sec ±24.92% (76 runs sampled)
Node v12.0.0
read to write x 6,514 ops/sec ±2.36% (75 runs sampled)
read to passthru to write x 3,664 ops/sec ±27.53% (73 runs sampled)
Node v11.15.0
read to write x 8,941 ops/sec ±2.28% (75 runs sampled)
read to passthru to write x 5,870 ops/sec ±2.60% (78 runs sampled)
Node v11.0.0
read to write x 9,356 ops/sec ±2.87% (69 runs sampled)
read to passthru to write x 6,338 ops/sec ±2.36% (73 runs sampled)
Node v10.16.0
read to write x 9,621 ops/sec ±2.19% (74 runs sampled)
read to passthru to write x 6,610 ops/sec ±2.27% (76 runs sampled)
Any ideas where to look for a potential root cause here? Happy to dig thru
/ping @nodejs/streams
Would you mind adding a reproducible benchmark? It seems you are interactive with a native resource / the change might actually be there or due to somewhere else entirely (a change in v8 for example).
Check out https://clinicjs.org/flame/ to help you debug these (there is also an upload function in clinic, it will give you a link that you can paste here for example).
@mcollina happy to look at Clinic.js, seems pretty cool! In the meantime, I did include a benchmark above, not sure if you saw it or not (when saying that "you are interacting with a native resource") - the benchmark below uses just Node streams
const Benchmark = require('benchmark')
const { Writable, Readable, PassThrough } = require('stream');
const suite = new Benchmark.Suite;
function devnull() {
return new Writable({
objectMode: true,
write: function (event, encoding, callback) {
callback();
}
})
}
function readable() {
let idx = 0, len = 100;
return new Readable({
read: function() {
while(idx++ < len) {
if(!this.push('foo' + idx))
return;
}
this.push(null);
}
});
}
console.log(`Node ${process.version}`);
// add tests
suite
.add('read to write', function(deferred) {
readable()
.pipe(devnull())
.on('finish', () => deferred.resolve())
}, {defer: true})
.add('read to passthru to write', function(deferred) {
readable()
.pipe(new PassThrough())
.pipe(devnull())
.on('finish', () => deferred.resolve())
}, {defer: true})
// add listeners
.on('cycle', function(event) {
console.log(String(event.target));
})
// run async
.run({ 'async': true });
I can reproduce the performance drop.
The main difference between the two is that 12 spends a lot of time in __ZN2v88internalL47Builtin_Impl_Stats_CallSitePrototypeGetTypeNameEiPmPNS0_7IsolateE
, which is a C++ in V8. @bmeurer wdyt? What could cause that?
I have a hunch that it can be related to https://github.com/nodejs/node/pull/27146.~~ No, it's not due to that.
I'll investigate more and report.
This regression was introduced when we updated to V8 7.3.492.25.
@nodejs/v8 would you like to chime in?
cc @hashseed @bmeurer @targos @ryzokuken
Please check out https://github.com/nodejs/node/pull/28842, it should solve the regression.
Should we replace occurrences of Object.getPrototypeOf
with instanceof
in other files?
Comment from @mcollina in https://github.com/nodejs/node/pull/28842#issue-300744529:
use instanceof instead of Object.getPrototypeOf, as checking an
object prototype with Object.getPrototypeOf is slower
than an instanceof check.
Comment from @targos in https://github.com/nodejs/node/pull/28842#issuecomment-514661497:
I would not add the comments about
Object.getPrototypeOf
becauseinstanceof
seems like the standard way to do the check to me.
Should we replace occurrences of
Object.getPrototypeOf
withinstanceof
in other files?
I think this only applies to the one in lib/zlib.js
, besides streams. And there it may or may not be faster to test against not being an Uint8Array, because the comment in the zlib code is outdated (streams do support any typeof Uint8Array).
hmm @mcollina, I'm still seeing a significant perf drop on 12.10.0
```
workspace/bench/src# node stream.js
Node v10.16.0
read to write x 8,550 ops/sec ±5.17% (70 runs sampled)
read to passthru to write x 5,111 ops/sec ±24.00% (66 runs sampled)
workspace/bench/src# node stream.js
Node v12.10.0
read to write x 4,248 ops/sec ±34.48% (76 runs sampled)
read to passthru to write x 2,607 ops/sec ±20.69% (75 runs sampled)
```
It got worse in 12.10.0 vs 12.9.0
$ node bench.js
Node v12.6.0
read to write x 17,812 ops/sec ±2.07% (78 runs sampled)
read to passthru to write x 11,389 ops/sec ±1.55% (83 runs sampled)
$ node bench.js
Node v12.10.0
read to write x 13,615 ops/sec ±0.89% (84 runs sampled)
read to passthru to write x 7,589 ops/sec ±1.05% (86 runs sampled)
$ node bench
Node v12.9.0
read to write x 22,779 ops/sec ±2.65% (82 runs sampled)
read to passthru to write x 14,617 ops/sec ±1.14% (86 runs sampled)
@mcollina @mscdex I’ve investigated this. It looks like f4f88270e7b4acdbd8c5679f8f0c8d996456f3a8 (#25461) only works on V8 7.7+ – on 7.6, creating the object literal the way that that PR does involves multiple calls into the V8 runtime (v8::internal::Runtime_DefineDataPropertyInLiteral
), whereas on 7.7 it doesn’t call into the runtime.
I think that means we either have to revert f4f88270e7b4acdbd8c5679f8f0c8d996456f3a8 on v12.x, or wait for the V8 update to 7.7 on v12.x (#29241), which should be ready next week.
I would revert it if we decide to do a release before V8 7.7 lands
This is solved again in 12.11.0:
$ node bench.js
Node v12.11.0
read to write x 28,242 ops/sec ±1.93% (82 runs sampled)
read to passthru to write x 17,695 ops/sec ±1.92% (87 runs sampled)
$ node bench.js
Node v12.10.0
read to write x 13,527 ops/sec ±0.89% (85 runs sampled)
read to passthru to write x 7,417 ops/sec ±1.48% (85 runs sampled)
~/temp/stream-bug
$ node bench.js
Node v12.9.0
read to write x 22,527 ops/sec ±2.30% (77 runs sampled)
read to passthru to write x 14,578 ops/sec ±0.99% (86 runs sampled)
$ node bench.js
Node v10.16.0
read to write x 19,750 ops/sec ±1.72% (85 runs sampled)
read to passthru to write x 13,717 ops/sec ±1.23% (84 runs sampled)
Most helpful comment
I believe I've isolated the problem to streams and
12.0.0
seems to be first time when this appeared. See benchmark bench-stream.txt