Node: buffer.write performance issue

Created on 28 Sep 2020  路  8Comments  路  Source: nodejs/node

  • Version: v12.18.4
  • Platform: Linux 5.4.0-45-generic #49~18.04.2-Ubuntu SMP Wed Aug 26 16:29:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: buffer.js

What steps will reproduce the bug?

I have not been able to create an isolated example outside of the application that reproduces the perf degradation - likely due to optimization.

When profiling our application (Cribl LogStream) we noticed that the top function call was a function called hidden - after some digging it turns out that the call trace is something like this

 Buffer.write
   validateInt32
     hideStackFrames
        hidden

image

after modifying the application to all the undocumented Buffer.utf8Write instead of Buffer.write we see about 20% overall improvement and the heavy bottom profile looks like follows - note during both times the application was profiled for same amount of time (30s).

image

I noticed the same performance improvement after updating hideStackFrames to look like this:

function hideStackFrames (fn) { return fn; } 

I have not been able to reproduce the perf degradation using a script that isolates just Buffer.write operations. I don't even see the hidden function calls at all during profiling. However, when I set a breakpoint in hideStackFrames and then start profiling I do end up seeing hidden in the profile - which make me think there's some optimization/compilation/inlinning issue at play.

UPDATE 9/28
I was able to repro the perf degradation by disabling inlining

buf$ node  buffer.js 
Buffer.write: 787.863ms
Buffer.utf8Write: 741.131ms

buf$ node --max-inlined-bytecode-size=0  buffer.js 
Buffer.write: 1233.917ms
Buffer.utf8Write: 781.519ms

here's how buffer.js looks like

const b = Buffer.alloc(1000*1000);
const str = '12345' + '67890';

console.time('Buffer.write');
let off = 0;
for(let i=0;i<1e7; i++) {
    off += b.write(str, off);
    if(off >= b.length) off = 0;
}
console.timeEnd('Buffer.write');

console.time('Buffer.utf8Write');
off = 0;
for(let i=0;i<1e7; i++) {
    off += b.utf8Write(str, off);
    if(off >= b.length) off = 0;
}
console.timeEnd('Buffer.utf8Write');

Could this mean that the default V8 inline settings are too conservative for the server side?

buffer performance

Most helpful comment

In fact, this is significant enough only for the shortest and simplest library calls, @ledbit 's example uses 8 bytes buffers, but still, why not

All 8 comments

Update:

  1. confirmed same performance issue is present in 14.13.0
  2. confirmed that using --stress-inline has no effect on perf

Any thoughts on what other options to try? For the short term, and since performance hit is so great we're calling utf8Write directly, but we'd rather not rely on a non-public API function

  • able to reproduce (in mac).
  • when I swapped the order of calls (utf8Write first), the difference comes down a bit, but still there is a gap
  • not sure if inlining is at fault yet

@nodejs/v8 is there a way to improve the inlining for cases like this one? And could the try ... finally block be the fault to prevent inlining the code?

@BridgeAR - I tried removing try ... finally unfortunately it had no effect in performance.

const hideStackFrames = function(fn) {
  return function hidden(...args) {
    return fn(...args);
  }
}

I wonder if hiding the validator frames is worth the performance cost (if I am reading the profiling results correctly)

I wrote this benchmark. Its baseline result for latest master on my machine is:

$ ./node benchmark/misc/hidestackframes.js 
misc/hidestackframes.js n=100000 type="no-error": 20,704,220.948820617
misc/hidestackframes.js n=100000 type="error": 45,215.81646655569

With the following implementation

function hideStackFrames(fn) {
  return fn;
}

it shows:

$ ./node benchmark/misc/hidestackframes.js 
misc/hidestackframes.js n=100000 type="no-error": 48,594,383.16962696
misc/hidestackframes.js n=100000 type="error": 45,900.658629285084

So, there is a certain penalty introduced by hideStackFrames. I'm currently thinking of ways to move the penalty to the exception-path instead of success-path, while keeping the same behavior.

Update. I've tried re-running the benchmark and I can no longer see any difference, so please ignore this message. Disabling inlining with --max-inlined-bytecode-size=0 also doesn't make any difference, except for reducing results for both hideStackFrames implementations.

In fact, this is significant enough only for the shortest and simplest library calls, @ledbit 's example uses 8 bytes buffers, but still, why not

Seeing this on another Buffer code-path

image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AkashaThorne picture AkashaThorne  路  207Comments

mikeal picture mikeal  路  197Comments

egoroof picture egoroof  路  90Comments

TazmanianDI picture TazmanianDI  路  127Comments

jonathanong picture jonathanong  路  91Comments