Node: stdout/stderr buffering considerations

Created on 25 Apr 2016 · 55Comments · Source: nodejs/node

_I tried to discuss this some time ago at IRC, but postponed it for quite a long time. Also I started the discussion of this in #1741, but I would like to extract the more specific discussion to a separate issue._

I could miss some details, but will try to give a quick overview here.

Several issues here:

Many calls to console.log (e.g. calling it in a loop) could chew up all the memory and die — #1741, #2970, #3171, #18013.
console.log has different behavior while printing to a terminal and being redirected to a file. — https://github.com/nodejs/node/issues/1741#issuecomment-105333932.
Output is sometimes truncated — #6297, there were other ones as far as I remember.
The behaviour seems to differ across platforms.

As I understand it — the output has an implicit write buffer (as it's non-blocking) of unlimited size.

One approach to fixing this would be to:

Introduce an explicit cyclic write buffer.
Make writes to that cyclic buffer blocking.
Make writes from the buffer to the actual output non blocking.
When the cyclic buffer reaches it's maximum size (e.g. 10 MiB) — block further writes to the buffer until a corresponding part of it is freed.
On (normal) exit, make sure the buffer is flushed.

For almost all cases, except for the ones that are currently broken, this would behave as a non-blocking buffer (because writes to the buffer are considerably faster than writes from the buffer to file/terminal).

For cases when the data is being piped to the output too quickly and when the output file/terminal does not manage to output it at the same rate — the write would turn into a blocking operation. It would also be blocking at the exit until all the data is written.

Another approach would be to monitor (and limit) the size of data that is contained in the implicit buffer coming from the async queue, and make the operations block when that limit is reached.

libuv net process stdio

Source

ChALkeR

👍12 👎1

Most helpful comment

I did some experiments on the performance of process.stdio, and here's my finding (along with some issues of Node.js process.stdio). I believe it's related so I'm posting it here...

I was originally inspired by this post How is GNU's yes so fast, and I wanted to write it in Node.js.

Take No.1

while (true) {
  console.log('y');
}

It came out with this:

$ node yes-1.js | pv -i 10 > /dev/null
^C03MiB 0:00:10 [ 822KiB/s] [ <=>

Comparing to gnu yes performance on my computer:

$ yes | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [7.79GiB/s] [ <=>

Note that using process.stdout.write() is no better since IO is much much slower than a extra bunch of instructions.

How could we catch up with gnu yes? Buffering is the answer...

Take No. 2

var SIZE = 8192; // Same buf size with gnu-yes
var buf = Buffer.alloc(SIZE, 'y\n');
while (true) {
  process.stdout.write(buf);
}

Give it a shot:

$ node yes-2.js | pv > /dev/null
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory          ]
 118MiB 0:00:08 [14.3MiB/s] [   <=>

Wait... what happened to the heap? It's actually because of the slow IO... so we have to wait for IO to finish when there is too much of it ongoing, it's like close the tap when the bathtub is full, or you'll get "overflow".

Take No. 3

var SIZE = 8192;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes3() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes3);
      break;
    }
  }
}

yes3();

And the result is:

$ node yes-3.js | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [2.65GiB/s] [ <=>

2.65GiB/s is tons better than 822KiB/s, but there is still room to improvement considering the 7.79GiB/s that gnu yes does. I figured out that on my machine every call to a native C++ binding function will take ~ 3 microsecond no matter how less of code the C++ function does. So let's do the math:

1 / 0.000003 * 8192 / 1024 / 1024 / 1024 = 2.54 GiB/s

The above is the top speed I can get with buffer size === 8192, so let me make the buffer bigger ...

Take No. 4

var SIZE = 65536;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes4() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes4);
      break;
    }
  }
}

yes4();

$ node yes-4.js | pv -i 10 > /dev/null
^C.6GiB 0:00:10 [6.16GiB/s] [ <=>

Buffer size 65536 is the best I've got using this approach. Bigger buffer will slow it down (is this because of some memory operation happens in Node.js core?)

The last (highest) bar is gnu yes.

So what can we conclude so far? I'm afraid there is nothing else than the same old rule of IO: use buffer and make it faster in several order of magnitude...

And then consider a few things that are Node.js specific:

Unlike process.stdout.write(), console.log() doesn't return anything, so there is no way that user can know that they should wait. This could happen when (1) there is too much logging (2) when there is too much IO in an application (3) when the IO is slow. Real scenarios can be one or any combination of the above three. e.g.

A Node.js test suites often crash due the heavy use of console.log() when the CI server is slow;
A stress test in a simulated environment with logging turned on often consumes extra large amount of memory or even crashes. Everything is OK when logging is turned off.
Nobody notices the false return value of process.stdio.write() and waits for the drain event; some of them are using console.log() and won't be able to do so.
I can go on and on but you know what I mean.

Even if there is no memory issue or crash, like what I've listed above, the slow IO will still cause issues, e.g.

The application won't be fast enough (IO throughput) for users who don't manage a buffer themselves.
No current logging frameworks is buffer-ed; they're all slow. Because there are many calls to C++ binding functions and each takes at least ~ 3 microsecond plus the cost of JS/C++ context switching, thinking of the accumulation of thousands of them. Note that pure JavaScript string concat in a JavaScript function only costs 0.05 microsecond (and there is no JS/C++ context switching), this is the cost of "writing a single log line" when buffering is in use.
Current logging frameworks are with a very low speed capacity: if average log line length is 80 chars, then the top speed is 1 / 0.000003 * 80 / 1024 / 1024 = 25 MiB/s. And usually an application is far behind this top speed. This could force users to use less logging in both development and production environment. Even if someone writes a super-fast logging framework, no applications could benefit from it unless all npm module owners are willing to depend on the new one. So I think this lead to the question: can Node.js core do something about it? e.g. providing a new set of API for buffer-ed IO? And/or slightly modify the behavior of current console.log()?

I'd like to discuss more and I'd love to contribute PRs. I'm doing more experiments right now, trying to get a solid solution with obvious advantage. And I think I should raise the question here for discussion.

kenny-y on 7 Mar 2018

👍9 ❤8

All 55 comments

process.stdout._handle.setBlocking(true); as proposed in https://github.com/nodejs/node/issues/1741#issuecomment-190649817 and https://github.com/nodejs/node/issues/6297#issuecomment-214432099 would considerably hurt the performance. Exposing that isn't a good solution to any of this.

ChALkeR on 25 Apr 2016

process.stdout._handle.setBlocking(true);
would considerably hurt the performance

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes. The default could still remain non-blocking. There's no harm in giving people the option. Having CLI applications block on writes for stdout/stderr is quite desirable. This is not unlike all the Sync calls in the fs module. In a high-performance server scenario you don't have to use them, but they are there if you need them.

kzc on 26 Apr 2016

👍2 👎1

There's no harm in giving people the option.

There is, every option comes at a cost from both sides. That option would need to be supported, and it would actually just toggle the behaviour between two states that are far from being ideal in many cases.

As for the option here, the said cyclic buffer maximum size could perhaps be configured using a command-line argument, at the buffer size 0 turning in into a fully blocking API.

ChALkeR on 26 Apr 2016

toggle the behaviour between two states that are far from being ideal in many cases

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

On UNIX it is a trivial system call on a fd. It's pretty much the same thing on Windows as far as I recall.

Edit: See https://github.com/nodejs/node/issues/6297#issuecomment-212486389

kzc on 26 Apr 2016

👎2

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

We should not be documenting underscore-prefixed properties. This would need to be a separate, true _public_ API.

mscdex on 26 Apr 2016

👍4

In CLI tools, I don't want to have stdout/stderr be blocking though, rather I want to be able to explicitly exit the process and have stdout/stderr flushed before exiting. Has there been any discussion about introducing a way to explicitly exit the process gracefully? Meaning flushing the stdout/stderr and then exit. Maybe something like a process.gracefulExit() method? In ES2015 modules, we can no longer return in the top-scope, so short-circuiting will then be effectively impossible, without a nesting mess.

sindresorhus on 27 Apr 2016

👍8

The exit module is a drop in replacement for process.exit() that effectively waits for stdout/stderr to be drained. Other streams can optionally be drained as well. Something like it really ought to rolled into node core with a different name.

Edit: upon further analysis, the exit module is no longer recommended.

kzc on 27 Apr 2016

I think there's a slight discrepancy in interpretation here.

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's _broken_ here.

One thing to note is that on most systems (usually UNIX systems) stdout is buffered to begin with, whereas stderr is usually not (or sometimes just line buffered).

However, an explicit fflush(stdout) or fflush(stderr) is not necessary upon a process closing because these buffers are handled by the system and are flushed when main() returns. If the buffer grows too large during execution, the system automatically flushes it.

This is why the following program still outputs while the loop is running, and on most terminal emulators the same reason why it looks like it's 'jittery'.

#include <stdio.h>

int main(void) {
    for (int i = 10000000; i > 0; i--) {
        printf("This is a relatively not-short line of text. :) %d\n", i);
    }
    return 0;
}

_Blocking_, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a _terrible_ idea.

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes.

I respectfully disagree. There is absolutely no point in blocking on stdout/stderr. It's very rarely done in the native world and when it is it's usually for concurrency problems (two FILE* handles being written to by two threads that cannot otherwise be locked; i.e. TTY descriptors). Let the system handle it.

The above program _without_ fflush(stdout) on every iteration produced the following average:

real 6.32s
sys  0.54
405 involuntary context switches

and _with_ fflush(stdout) on each iteration:

real 7.48s
sys  0.56s
1607  involuntary context switches

Thats 18.3% of a time increase on this microbenchmark alone. I'm in the _'time is relative and isn't a great indicator of actual performance'_ camp, so I included the involuntary context switch count (voluntary context switches would be the same between the two runs, obviously).

The results show that fflush(stdout) upon every message encouraged the system to switch out of the process 1202 more times - that's 296% higher than ye olde system buffering.

Further, the feature that was requested by @sindresorhus is part of how executables are formed - at least on UNIX systems, all file descriptors that are buffered are flushed prior to/during close()-ing or fclose()-ing, flushing prior to a process exiting is a needless step and exposing a new function for it would be completely unnecessary.

I'm still lost as to _what_ is trying to be fixed here... I wouldn't mind seeing a process.stdout.flush() method, but it'd be abused in upstream dependencies and I'd find myself writing some pretty annoyed issues in that case. I've never run into a problem with Node's use of standard I/O...

Also, just to reiterate - please don't even start with a .setBlocking() method. That's asking for trouble. The second a dependency makes standard I/O blocking then _everything else_ is affected and then I have to answer to my technical lead about why my simple little program is switching out of context a bazillion times when it didn't before.

Qix- on 27 Apr 2016

👍3

@Qix-

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's broken here.

See the issues I linked above.

Blocking, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a terrible idea.

No, of course I don't. console.log is not the same as printf in your example.
It's an async operation. I don't propose fflush, I propose to somehow make sure that we don't fill up the async queue with console.logs.

ChALkeR on 27 Apr 2016

@ChALkeR are we actually seeing this happen?

var str = new Array(8000).join('.');

while (true) {
    console.log(str);
}

works just fine. In what _realistic_ scenario are we seeing the async queue fill up? The three linked issues are toy programs that are _meant_ to break Node, and would break in most languages, in just about all situations, on all systems (resources are finite, obviously).

The way I _personally_ see it, there are tradeoffs to having coroutine-like execution paradigms and having slightly higher memory usage is a perfectly acceptable tradeoff for the asynchronicity Node provides. It doesn't handle extreme cases because those cases are well outside any _normal_ or even _above normal_ scenario.

The first case, which mentions a break in execution to flush the async buffer, is perfectly reasonable and expected.

_Side note:_ thanks for clarifying :) that makes more sense. I was hoping it wasn't seriously proposed we fflush() all the time. Just had to make sure to kill that idea before anyone ran with it.

Qix- on 27 Apr 2016

@Qix-

What's your platform?
Are you sure that you are not redirecting the output anywhere?

ChALkeR on 27 Apr 2016

@ChALkeR

OS X and linux (tested on both)
Positive.

Qix- on 27 Apr 2016

@Qix- Are you saying that memory usage eventually stabilizes when you run this? How much memory does it consume for you?

ChALkeR on 27 Apr 2016

@ChALkeR ah no, memory usage goes off the charts. Again, though, is this happening in practical environments outside of stress tests?

Qix- on 27 Apr 2016

@Qix- Yes, it does. Users have reported it several times already, see links above.

ChALkeR on 27 Apr 2016

@ChALkeR The three links you mentioned (#1741, #2970, #3171) are not _practical environments outside of stress tests_.

Truncated output (#6297 _et al_) is unrelated to the ones mentioned above.

Qix- on 27 Apr 2016

@Qix- #3171, for example, doesn't look like just a stress test to me.

ChALkeR on 27 Apr 2016

@ChALkeR

for (var i = 0; i < 1000000000; i++) {
    value = nextPrime(value);
    console.log(value);
}

This is definitely a stress test. This isn't a _practical_ or _realistic_ use of Node.js - an async scripting wrapper for IO-heavy applications.

For instance,

function loggit() {
    console.log('hello');
    setImmediate(loggit);
}

loggit()

tops out at 15M memory consumption. I've tweaked #3171's code to reflect the changes here, which now hovers around 45M (instead of inflating to over a gig in about a minute).

Of course a stupidly huge while/for loop with a ton of non-blocking IO (not to mention, a function that is solving an NP hard problem) is going to cause problems. _Don't do it._ It's a simple misunderstanding of how Node works, and if it were up to me I'd brush it off as an error on the developer's part.

The second we start baby-sitting bad design like _all three_ of the examples you mentioned and start ad-hoc adding methods like .setBlocking(true) is when we start stepping on the toes of dependents down-stream, which begins to break _everything_.

Qix- on 27 Apr 2016

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Either blocking tty writes or graceful drain of stdout/stderr upon process.exit() would fix the problem at hand with truncated tty output.

I think both APIs should be part of node core. Devs should have the option to have stdout/stderr block if they so should choose or have stdout/stdout gracefully flushed if they find that more appropriate.

Logging to the console 10000000 times in a tight loop is not real life behavior. In reality performance oriented servers rarely log to the console - and when they do it's important that the information is output. But if you did want to log to the console 1e7 times (and run out of memory due to another known node issue) then you would not enable blocking on stdout - so it's a red herring.

kzc on 27 Apr 2016

stdout/stderr should flush on program exit anyway. It's the fact node uses the exit() call that messes quite a few things up. That's actually an easy enough fix with Node specifically.

My problem with .setBlocking() is that it affects the process globally, not just specific pieces of code or files. That means a dependency could mess up how my code functions without really understanding what .setBlocking() does.

Qix- on 27 Apr 2016

👍1

Comment just posted in other thread: https://github.com/nodejs/node/issues/6297#issuecomment-215068674

process.stdout.flush() and process.stderr.flush()

It's not like C where the data is buffered in libc and _can_ be flushed. The unflushed data is in some Javascript data structure in V8 and the event loop has been halted.

kzc on 27 Apr 2016

@kzc yes, that is also a problem. exit() in and of itself is the cause of your specific problem, as has been mentioned quite a few times.

Qix- on 28 Apr 2016

@ChALkeR I dragged over the contents from the issue to #6456 and would be closing this, since I think it's highly related. Please feel free to reopen if you see the need. Thx.

eljefedelrodeodeljefe on 29 Apr 2016

@eljefedelrodeodeljefe #6456 is only about process.exit(), and it doesn't cover the issues and the proposal outlined here. I don't see a reason for burying this until we have a proper discussion.

ChALkeR on 29 Apr 2016

alright

eljefedelrodeodeljefe on 29 Apr 2016

The three linked issues are toy programs that are meant to break Node, and would break in most languages, in just about all situations, on all systems (resources are finite, obviously).

I wanted to evaluate this statement. So I build programs like the one in 2970 for C#, Java, Python, and PHP (on Windows). None of these languages exhibited the memory consumption behavior that Node.js does.

ben-page on 2 May 2016

@ben-page ok? Those languages are synchronous and blocking by default. Apples to oranges comparison - they have completely different use cases.

Qix- on 2 May 2016

@Qix- I agree. That's my point, mostly. One can't really argue these examples are "toy programs that are meant to break Node" and they "would break in most languages, in just about all situations, on all systems". I imagine there is very little overlap in those two set. In fact, these examples would break in very few languages, because most languages block when writing the console (as you said).

P.S. Erlang does not have this problem.

ben-page on 3 May 2016

Moreover, I don't think it's fair to dismiss these issues as "toy programs". An MCVE is inherently a toy program. But there's usually a real world reason that led to the issue being reported.

Additionally, I think it's worth considering that it's not necessarily obvious that console.log() asynchronous. Very few languages operate that way. And it's unlike other asynchronous functions in Node.js, that supports callbacks or events. This behavior can mimic a memory leak and has clearly caused multiple people trouble.

ben-page on 3 May 2016

I think it's worth considering that it's not necessarily obvious that console.log() asynchronous

Node.js is an async I/O environment. If you're using node and not understanding how it works at least at that very basic level, you're doing yourself a disservice. ;) Asynchronycity can manifest itself in many ways. Callbacks aren't even necessarily 'async' by nature (they can be synchronous).

Qix- on 3 May 2016

I agree with you completely, but that's not really a response to my point.

I did not say that people should not have to understand the language or that callbacks are necessarily asynchronous. I said that console.log is different from other asynchronous functions in that it is does not support any method of flow control or error handling (such as callbacks).

ben-page on 3 May 2016

@ben-page not really Node's fault, though. console.log exists in Node because it exists in browsers (I believe console is a basic global within V8 itself but I could be wrong).

Qix- on 3 May 2016

@Qix- Node.js has its own console.log, and that uses node’s own process.stdout under the hood.

addaleax on 3 May 2016

@addaleax Ah okay. That makes sense since that's pretty implementation specific anyway.

Qix- on 3 May 2016

This causes problems when you want the process to compute something and return a result then exit. If that result is large data (e.g. encoded binary data, or a large json string) then it's currently difficult to do. Here's a simple example:

var largeData = "SomeData".repeat(1024*1024) + ' ..... END!';


console.log(largeData); //using process.stdout.write here makes no difference either

process.exit();

You'll see that ...END! never gets printed to the console (At least on Node 6.3 on linux/bash). The data gets truncated.

All that's really needed is a simple callback for when the stdout command has finished:

process.stdout.write(largeData, function() { process.exit(); });

TRPB on 24 Jul 2016

I've just wasted time debugging an issue when the bug was from some other part of the program, due to STDERR getting truncated on crash and sending me the wrong way... it would be nice if the _error_ stream behaved the same way as it has in virtually every other language I've ever use (many of which will buffer-by-default for other streams). Even for a server environment, which is more important? Allowing many millions of crashes to be generated per second, or being able to log their details completely?

battlesnake on 9 Aug 2016

😄2 👍2

@ChALkeR ... does this need to remain open?

jasnell on 30 May 2017

I did some experiments on the performance of process.stdio, and here's my finding (along with some issues of Node.js process.stdio). I believe it's related so I'm posting it here...

I was originally inspired by this post How is GNU's yes so fast, and I wanted to write it in Node.js.

Take No.1

while (true) {
  console.log('y');
}

It came out with this:

$ node yes-1.js | pv -i 10 > /dev/null
^C03MiB 0:00:10 [ 822KiB/s] [ <=>

Comparing to gnu yes performance on my computer:

$ yes | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [7.79GiB/s] [ <=>

Note that using process.stdout.write() is no better since IO is much much slower than a extra bunch of instructions.

How could we catch up with gnu yes? Buffering is the answer...

Take No. 2

var SIZE = 8192; // Same buf size with gnu-yes
var buf = Buffer.alloc(SIZE, 'y\n');
while (true) {
  process.stdout.write(buf);
}

Give it a shot:

$ node yes-2.js | pv > /dev/null
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory          ]
 118MiB 0:00:08 [14.3MiB/s] [   <=>

Take No. 3

var SIZE = 8192;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes3() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes3);
      break;
    }
  }
}

yes3();

And the result is:

$ node yes-3.js | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [2.65GiB/s] [ <=>

1 / 0.000003 * 8192 / 1024 / 1024 / 1024 = 2.54 GiB/s

The above is the top speed I can get with buffer size === 8192, so let me make the buffer bigger ...

Take No. 4

var SIZE = 65536;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes4() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes4);
      break;
    }
  }
}

yes4();

$ node yes-4.js | pv -i 10 > /dev/null
^C.6GiB 0:00:10 [6.16GiB/s] [ <=>

Buffer size 65536 is the best I've got using this approach. Bigger buffer will slow it down (is this because of some memory operation happens in Node.js core?)

The last (highest) bar is gnu yes.

So what can we conclude so far? I'm afraid there is nothing else than the same old rule of IO: use buffer and make it faster in several order of magnitude...

And then consider a few things that are Node.js specific:

A Node.js test suites often crash due the heavy use of console.log() when the CI server is slow;
A stress test in a simulated environment with logging turned on often consumes extra large amount of memory or even crashes. Everything is OK when logging is turned off.
Nobody notices the false return value of process.stdio.write() and waits for the drain event; some of them are using console.log() and won't be able to do so.
I can go on and on but you know what I mean.

Even if there is no memory issue or crash, like what I've listed above, the slow IO will still cause issues, e.g.

The application won't be fast enough (IO throughput) for users who don't manage a buffer themselves.
No current logging frameworks is buffer-ed; they're all slow. Because there are many calls to C++ binding functions and each takes at least ~ 3 microsecond plus the cost of JS/C++ context switching, thinking of the accumulation of thousands of them. Note that pure JavaScript string concat in a JavaScript function only costs 0.05 microsecond (and there is no JS/C++ context switching), this is the cost of "writing a single log line" when buffering is in use.
Current logging frameworks are with a very low speed capacity: if average log line length is 80 chars, then the top speed is 1 / 0.000003 * 80 / 1024 / 1024 = 25 MiB/s. And usually an application is far behind this top speed. This could force users to use less logging in both development and production environment. Even if someone writes a super-fast logging framework, no applications could benefit from it unless all npm module owners are willing to depend on the new one. So I think this lead to the question: can Node.js core do something about it? e.g. providing a new set of API for buffer-ed IO? And/or slightly modify the behavior of current console.log()?

kenny-y on 7 Mar 2018

👍9 ❤8

I've done a prototype on process.stdout.write() and it generates 20x performance gain. Here is the details.

(20x is not a big deal in when applying proper buffer mechanism to IO, we can expect more, see analysis below)

The principle is to use a buffer, say: 64 KiB (this is a good value in my previous experiment), to avoid a lot of actual writes.
Reduce the call from JS to C++, which will cost a couple of microseconds (10e-6 sec).
Avoid the use of Node Buffer, which will call into C++; use JavaScript string as the buffer mechanism. Concat strings into a bigger one before writing (Let's call this bigger one a "string buffer")
The modified process.stdout.write() that runs in the happy path (only concat string to current "string buffer") costs <100ns (10e-9 sec). Comparing to the original microsecond (10e-6 sec) version, this will increase the performance, 10+ times faster. The time cost of the actual write (10e-6 sec) will be amortized to every previous calls, it's still fast because the string buffer is pretty big (e.g. a typical 100 chars log line vs. 64 KiB string buffer -- amortized to 600+ calls).
But there is a drawback, a bottleneck was found when writing strings to process.stdout.write(), here is some details on this bottleneck (should I file a new issue for it?):
- process.stdout.write() always converts JavaScript Strings into Node Buffer, therefore it will slowdown the IO throughput to 1/5 or less, typical data: when writing Node buffer, the overall throughput is 8.x GiB/s; when writing strings, the overall throughput is 1.6x GiB/s.
- I've found that there is a series of C++ binding function writeUtf8String, writeAsciiString, ... and I believe if we write some if-else to detect (default) encoding argument in writeOrBuffer() (in _stream_writable.js) instead of always convert strings into Node Buffer, we can eliminate this bottleneck in most of the cases.
- If we can address this bottleneck, the performance of this prototype can be 5 times faster. Adding these 2 together, 100x performance gain is possible.
Here is the pseudo code of the prototype (the real one is harder to read), simple but powerful.

let buf = '';
function cacheWriteString(str) {
  if (typeof str === 'string') {
    buf += str;

    if (buf.length >= 65536) {
      process.stdout.write(buf);
      buf = '';
    }
  }

  // Optional: flush buf in a certain interval
  //           e.g. setInterval(()=> {...}, 33);
}

Maybe changing the behavior of current process.stdout.write() is not a good idea since it will change the behavior of existing apps... if this is the case, a new API can be introduced and encourage developers to use that new one with much better performance.

I think the meaning of this approach is to increase the performance of Node.js apps that uses large amount of data on process.stdout, e.g. logging, text processing (sed/grep alike), piping, ...

When the performance is greatly increased, large amount of log lines can be on the table, which was not possible in the past because it would lead to either crash of app or use of large amount of memory.
Even if pretty amount of apps won't benefit a lot from this approach (since they're not doing much on standout output), Node.js, as a runtime, should still pursuit the goal of higher throughput via IO API.

Note I posted this comment last weekend and it's gone, not sure what happened, so I'm posting again (it's slightly different since I've lost the original one).

kenny-y on 28 Mar 2018

👍4

@kenny-y iteratively concatenating a large amount of small strings had a significant memory impact last time I checked due to the way strings are stored in V8 (which also makes them faster and more memory effective for regular use-cases which do not deal with iteratively concatenating a large number of short strings).

What is the memory impact of your patch?

ChALkeR on 28 Mar 2018

@ChALkeR Yes I did have some experiments done to measure memory stuff, here is the detail:

mem-1.js is to find out the overall memory footprint of 64KiB string cache

// mem-1.js
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n').toString();

let s = '';
while (s.length < 65536) {
  s += buf;
}

setInterval(() => {
  process.stdout.write(s);
}, 1000);

mem-2.js is to compare with the above one.

// mem-2.js
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n').toString();

let s = 0;
while (s < 65536) {
  s += 128;
}

setInterval(() => {
  process.stdout.write(buf);
}, 1000);

Overall memory footprint of mem-1.js is 7.3 MiB
Overall memory footprint of mem-2.js is 7.2 MiB
If there are many strings allocated and then gc-ed on the fly, the memory-impact to both should be identical on JavaScript side.

There is a related memory leak bug found (or is it?) in various Node.js version (including 8.10, 9.8, master, ...) when I tried to to reach the maximum throughput with a small buffer or string. The below program can reproduce the leak.

Therefore I can't compare the overall memory footprint in that scenario, because my prototype only takes constant amount of memory, say 37MiB, but the below program will consume 100+ MiB ~ 1.x GiB memory (usually sawtooth shaped but sometimes it stays at 1.x GiB)

// yes-7.1.js
// Memory consumption 100+ ~ 1000+ MiB
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n'); // memory consumption is the same when adding .toString()

function yes7() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', () => {
        setTimeout(yes7, 0);
        gc(); // requires command line option turned on
      });
      break;
    }
  }
}

yes7();

node yes-7.1.js | pv > /dev/null or just node yes-7.1.js if you're patient.

kenny-y on 28 Mar 2018

❤1

@ChALkeR What do you say on this approach? Do you think it's a good idea to change existing behavior or stdout/stderr, or introduce new APIs with proper names suggesting that a buffer is in use?

kenny-y on 2 Apr 2018

I think I've found a memory leak when trying to constantly write a string to process.stdout, the string size is 8 KiB (or less). I'm investigating what's really going on... will open a new bug if it's not related.

P.S. It's not the case for large strings (e.g. 64KiB) -- the "V8 Heap Used" is a constant value.

kenny-y on 20 Apr 2018

@ChALkeR Is there any short term plan to introduce a buffer mechanism in stdout/stderr? I'm interested in this and I'd love to help.

kenny-y on 24 Apr 2018

❤2

I am running into this issue, and I am not sure how to solve it. As of now, I am load testing my application, I kick off a series of API calls and slowly scale that up to ~100,000 concurrent connections. I am doing some simple logging inside my function, and that logging is causing my application to crash at ~5,000 connections.

I have tried --max-old-space-size=7168, --optimize_for_size, --stack_size=4096, and an npm package called 'log-buffer'. None of it helps to address the issue. When I use process.stdout._handle.setBlocking(true), it works up until ~5000 connections, and then it deletes my entire log file except for a single chunk at a time, then the application crashes.

When I log the memory usage I am nowhere near the limit for my server.

If I remove all of my logging, everything works (presumably as expected, I don't know for sure as I can't inspect the logs), the application runs to completion without crashing.

Is there anything that I can do to get around this issue?

ghost on 25 May 2018

deleting the log file and crashing may not be related to memory pileup as described in this issue, but some other problem.
I can investigate, but how are you simulating 10M concurrent connections? are you using multiple systems for clients?
do you have a core file that I can have a look at?

gireeshpunathil on 25 May 2018

Hi guys,

I just tried both those programs on node 8.11.0 and 10.1.0 with the program going to the end and no apparent blocking.

// example 1
for (i = 0; i < 1000000; i++) {
  console.log('abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc' + i);
}

For this the only gc traces I have are those:

[65723:0x103800400]      150 ms: Scavenge 7.0 (11.8) -> 5.7 (12.3) MB, 1.7 / 0.0 ms  allocation failure 
[65723:0x103800400]      158 ms: Scavenge 7.4 (12.3) -> 6.4 (16.8) MB, 1.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      168 ms: Scavenge 10.0 (16.8) -> 7.8 (17.3) MB, 1.9 / 0.0 ms  allocation failure 
[65723:0x103800400]      178 ms: Scavenge 10.7 (17.8) -> 8.9 (19.3) MB, 2.6 / 0.0 ms  allocation failure 
[65723:0x103800400]      187 ms: Scavenge 12.1 (19.3) -> 10.1 (28.3) MB, 2.6 / 0.0 ms  allocation failure 
[65723:0x103800400]      210 ms: Scavenge 17.4 (28.3) -> 13.0 (29.3) MB, 3.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      234 ms: Scavenge 18.8 (29.8) -> 15.1 (32.3) MB, 6.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      257 ms: Scavenge 21.5 (32.8) -> 17.5 (50.3) MB, 5.5 / 0.0 ms  allocation failure 
[65723:0x103800400]      308 ms: Scavenge 32.2 (51.3) -> 23.2 (53.8) MB, 8.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      373 ms: Scavenge 36.8 (56.3) -> 28.2 (60.8) MB, 14.0 / 0.0 ms  allocation failure 
[65723:0x103800400]      431 ms: Scavenge 44.6 (64.3) -> 34.4 (66.8) MB, 5.2 / 0.0 ms  allocation failure 
[65723:0x103800400]      453 ms: Mark-sweep 40.6 (67.8) -> 37.4 (70.8) MB, 3.0 / 0.0 ms  (+ 27.4 ms in 408 steps since start of marking, biggest step 3.3 ms, walltime since start of marking 266 ms) finalize incremental marking via stack guard GC in old space requested
[65723:0x103800400]      503 ms: Scavenge 54.4 (74.3) -> 42.8 (76.3) MB, 3.7 / 0.0 ms  allocation failure 
[65723:0x103800400]      542 ms: Scavenge 58.7 (78.8) -> 48.8 (83.3) MB, 4.3 / 0.0 ms  allocation failure 
[65723:0x103800400]      580 ms: Scavenge 64.9 (84.8) -> 54.9 (88.3) MB, 4.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      622 ms: Scavenge 71.0 (91.3) -> 60.9 (95.8) MB, 4.0 / 0.0 ms  allocation failure 
[65723:0x103800400]      661 ms: Scavenge 77.1 (97.3) -> 67.1 (100.8) MB, 4.1 / 0.0 ms  allocation failure 
[65723:0x103800400]      705 ms: Scavenge 83.1 (103.3) -> 73.1 (106.8) MB, 4.5 / 0.0 ms  allocation failure 
[65723:0x103800400]      750 ms: Scavenge 89.2 (109.8) -> 79.2 (113.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]      795 ms: Scavenge 95.3 (115.8) -> 85.3 (119.3) MB, 4.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      859 ms: Scavenge 101.4 (121.8) -> 91.3 (125.8) MB, 5.9 / 0.0 ms  allocation failure 
[65723:0x103800400]      940 ms: Mark-sweep 106.4 (127.8) -> 99.8 (130.8) MB, 4.1 / 0.0 ms  (+ 61.8 ms in 504 steps since start of marking, biggest step 1.7 ms, walltime since start of marking 179 ms) finalize incremental marking via stack guard GC in old space requested
[65723:0x103800400]      975 ms: Scavenge 112.7 (133.3) -> 102.0 (137.3) MB, 4.9 / 0.0 ms  allocation failure 
[65723:0x103800400]     1021 ms: Scavenge 119.3 (140.3) -> 109.5 (143.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1062 ms: Scavenge 125.3 (146.3) -> 116.2 (150.8) MB, 5.8 / 0.0 ms  allocation failure 
[65723:0x103800400]     1103 ms: Scavenge 132.5 (153.8) -> 123.2 (158.8) MB, 4.5 / 0.0 ms  allocation failure 
[65723:0x103800400]     1146 ms: Scavenge 139.4 (160.8) -> 130.1 (164.8) MB, 6.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1186 ms: Scavenge 146.3 (167.8) -> 137.1 (171.8) MB, 4.7 / 0.0 ms  allocation failure 
[65723:0x103800400]     1227 ms: Scavenge 153.2 (174.8) -> 144.0 (178.8) MB, 5.5 / 0.0 ms  allocation failure 
[65723:0x103800400]     1268 ms: Scavenge 160.2 (181.8) -> 150.9 (186.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1309 ms: Scavenge 167.1 (188.8) -> 157.9 (193.3) MB, 4.8 / 0.0 ms  allocation failure 
[65723:0x103800400]     1354 ms: Scavenge 174.0 (195.8) -> 164.8 (199.8) MB, 5.9 / 0.0 ms  allocation failure 
[65723:0x103800400]     1401 ms: Scavenge 181.0 (202.8) -> 171.7 (206.8) MB, 5.7 / 0.0 ms  allocation failure 
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc4725
[65723:0x103800400]     2512 ms: Mark-sweep 275.4 (309.6) -> 273.2 (312.6) MB, 156.9 / 0.0 ms  (+ 148.6 ms in 1266 steps since start of marking, biggest step 2.1 ms, walltime since start of marking 1066 ms) allocation failure GC in old space requested

// example 2
var largeData = "SomeData".repeat(1024*1024) + ' ..... END!';


console.log(largeData); //using process.stdout.write here makes no difference either

process.exit();

Does this mean that the issue has been resolved on those versions or am I just getting tricked?

I launched those programs with the --trace_gc flag and didn't see much related to the gc appear.

p.s. : I'm pretty new to memory analysis so I'm sorry if this is silly :\

I'm sorry for this extra question as well:

MarkSweep is a blocking kind of GC
Scavenge is not blocking

Is that correct?

Shahor on 25 May 2018

@gireeshpunathil To clarify, I am scaling up to ~100,000 connections, not 10M. What I am doing is creating a promise that continues to cycle through a set of API calls until it is stopped. I continue to scale up the number of promises being fired until I reach 100,000.

If I remove logging, this method works, it is only when I add logging that it crashes. Does that clarify anything for you or do you still need a code example?

ghost on 25 May 2018

ok, sorry, 10M vs. 0.1M was my mistake.

your statement If I remove logging, this method works brings back the problem into non-blocking stdio again.

creating a promise that continues to cycle through a set of API calls until it is stopped. - is it possible to show me a skeleton code of this so that I understand it fully?

that way we can confirm we are addressing the same issue as above or not. thanks!

gireeshpunathil on 26 May 2018

I would like to discuss one of the problem described in this thread that is relatively more pronounced than others from usage perspective: truncated console log on exit.

can we make the write descriptors blocking? No, it will block main thread: breakage in the asynchronous event driven sequences in other parts of the code

can we make the write descriptors blocking at exit sites? No, it will potentially delay or deny the exit sequences: e.g: descriptor is a socket | pipe that is full, and the recipient is not reading.

does this behavior unique to Node.js? I don't think so, an _exit or even exit call in the middle of a write into a non-blocking fd is expected to behave the same manner for any runtime (managed or unmanaged) with only difference of the pending data in the buffer being flushed in some cases, but that does not help flushing pending data in the app in case of non-blocking writes.

So I propose that it is fair to say the current behavior of console.log at process.exit site is expected: just like a listening socket is teared down in the middle of a connection, or a scheduled function invocation through setTimeout() is abandoned, the truncation of console.log makes sense to me.

But there are genuine use cases which cannot function well with this.

So to bring this to a logical and long term solution, I suggest this:

leave non-blocking behavior of standard streams [1, 2] in tact.
introduce a new api that uses blocking descriptors by design [ say console.logSync ] and well documented on its behavior so as to be used with caution.

Please share your views.

gireeshpunathil on 15 Jun 2018

I see two issues:

Minor: console.logSync is non-standard.
Major: console.log('a'); console.logSync('b') indeterminately prints either a\nb or b\na. I expect that will turn into an endless source of surprise for users.

I don't think there is a general solution, just one-off hacks.

One such hack: add a new uv_flush_handle_sync() function to libuv that tells it to write out the pending data of a set of handles (or just one handle) in blocking fashion without otherwise involving the event loop.

bnoordhuis on 15 Jun 2018

sure - so if I follow you correctly, do the following at process.exit():

capture the live stream handles (only tcp|pipe|tty) in the event loop.
make them half-duplex (stop receiving data)
capture their associated uv_write_t object, and obtain the data buffer, and descriptor.
tear down the event loop ( without closing the aforesaid fds - how? )
invoke flush method
proceed with the real exit sequences.

Is that how you see it? With that, the proposed method will be a trivial one as below, but the above sequence that Node has to perform may become tricky.

int uv_flush_handle_sync(int fd, char* data, size_t len) {
  int r;
  r = uv__nonblock(fd, 0);
  assert(r == 0);
  r = write(fd, data, len);
  assert(r == len);
  return 0;
}

Please let me know.

gireeshpunathil on 15 Jun 2018

@gireeshpunathil Yes, that's the basic idea. The real implementation would be more involved: take a list of handles, a timeout, possible some flags.

bnoordhuis on 18 Jun 2018

👍3

Seems like this ran its course and can be closed. If I'm wrong about that, don't hesitate to re-open.