nodejs sometimes leaves stdout/stderr in non-blocking mode

Created on 10 Aug 2017  路  10Comments  路  Source: nodejs/node

  • Version: v8.3.0
  • Platform: Linux 19083a5259e3 4.4.0-1022-aws #31-Ubuntu SMP Tue Jun 27 11:27:55 UTC 2017 x86_64 GNU/Linux
  • Subsystem: Unknown

node appears to leave stdout/stderr in non-blocking mode in certain cases, which can cause errors in other programs which assume these are in blocking mode. I'm not certain if this is a bug or not, but am just trying to understand this behavior better.

I'm having trouble finding authoritative information about non-blocking mode and whether it's "correct" for programs to assume that they'll be given stdout/stderr in blocking mode, but it does appear that a lot of programs aren't able to handle having stdout/stderr in blocking mode (Python being a notable case). It's also strange that node only appears to do this in specific cases (e.g. in Docker without a TTY).

Reproduction

I haven't yet been able to find a way to reproduce this without Docker. I've created a simple test repo that prints the status of the blocking flag before and after running nodejs at https://github.com/chriskuehl/nodejs-nonblocking-repro. It shows that the blocking flag on stdout/stderr changes after nodejs is run.

Alternatively, a reproduction using Docker looks like this:

  1. docker run -i node:latest bash to enter a shell in the container (don't use -t; it doesn't work if using a tty)
  2. python -c 'print "x" * 100000' inside the container to show that Python can print a lot of text without failing
  3. nodejs -e 'console.log(1+1)' inside the container. After running this command, stdout/stderr have been changed to non-blocking mode.
  4. python -c 'print "x" * 100000' inside the container, which should crash after printing some text (you should see a traceback somewhere, it may be at the top or interlaced in the middle of the printed text)

In Python 2, the traceback error is IOError: [Errno 11] Resource temporarily unavailable. In Python 3, it's a little more helpful: BlockingIOError: [Errno 11] write could not complete without blocking.

tty

Most helpful comment

I ran into this today as well and I have a reproducer without docker: https://gist.github.com/ivan/e1eafebc7e5273b6f80d2cc9bdb96a38 (tested on a recent Linux 4.4 and 4.14, nodejs 8.9.2 and 9.2.0)

All 10 comments

@chriskuehl Does this still reproduce on v9.2.0?

@maclover7 yes it does, same reproduction steps. Here's output based on the test repro I linked above:

ckuehl@raziel:~/proj/nodejs-nonblocking-repro$ docker build -t ckuehl-test .
Sending build context to Docker daemon  56.32kB
Step 1/4 : FROM node:9.2.0
 ---> c1d02ac1d9b4
[...]
ckuehl@raziel:~/proj/nodejs-nonblocking-repro$ docker run ckuehl-test
stdout is: blocking
stderr is: blocking
2
stdout is: nonblocking
stderr is: nonblocking
Now printing lots of data in Python causes an error:
9999999999[...]
  File "<string>", line 1, in <module>
BlockingIOError: [Errno 11] write could not complete without blocking

I ran into this today as well and I have a reproducer without docker: https://gist.github.com/ivan/e1eafebc7e5273b6f80d2cc9bdb96a38 (tested on a recent Linux 4.4 and 4.14, nodejs 8.9.2 and 9.2.0)

Potential fix in #17737. If anyone here can confirm it fixes the issue for them?

By the way, a python2 version of the test case:

import fcntl, os, sys, subprocess

for _ in range(2):
  nb = fcntl.fcntl(1, fcntl.F_GETFL) & os.O_NONBLOCK
  print("stdout blocking? {}".format(nb == 0))
  nb = fcntl.fcntl(2, fcntl.F_GETFL) & os.O_NONBLOCK
  print("stderr blocking? {}".format(nb == 0))
  sys.stdout.flush()
  subprocess.check_output(['out/Release/node', '-e', '2+2'])

I did not test the fix, but I wonder if it solves the problem when the parent process doesn't wait for the node process to exit? (Is there something that prevents stdout from going nonblocking in this case?)

I wonder if it solves the problem when the parent process doesn't wait for the node process to exit?

Depends on what you mean. Whether there's someone waiting doesn't matter to node.js but if another process concurrently tries to read or write to the same file description (note: not _file descriptor_), it might observe non-blocking I/O.

That's hard to remedy in the UNIX model, though - you can't always reopen stdio to get fresh file descriptions.

potential fix in #20592 . @chriskuehl are you in a position to apply the patch , validate and confirm? or you want to wait for a binary?

I guess we have to reopen this :(

(For context, the fix was reverted in #21257)

Quick question:

Given that the merged fix uses atexit(), I assume the fix does nothing when the node program is SIGKILLed, crashes or is OOM killed?

In that case programs that call node as a subprocess must still take extra precautions like Firefox does here by never giving node the actual terminal stdout/stderr, is that right?

@nh2 Yes, if node is hard-killed then it won't have a chance to reset stdio. That caveat applies to SIGKILL (because it can't be caught) but not e.g. SIGINT and SIGTERM, their signal handlers restore stdio.

(SIGSEGV and SIGBUS currently don't but that could be added. Then again, those signals mean the program is in an undefined state and trying to clean up might make it worse.)

by never giving node the actual terminal stdout/stderr, is that right?

You can, just reset their file descriptors to blocking I/O and cooked mode afterwards.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akdor1154 picture akdor1154  路  3Comments

mcollina picture mcollina  路  3Comments

danielstaleiny picture danielstaleiny  路  3Comments

fanjunzhi picture fanjunzhi  路  3Comments

jmichae3 picture jmichae3  路  3Comments