Ref #18016, ping @elibarzilay and @gireeshpunathil
It seems that Node 11's behavior changed compared to Node 10, and pipes are now automatically closed after being used as output stream from a process. I think this is a bug, because it makes it impossible to use the same pipe as output from two different processes (which would be the case if I was to implement (foo; bar) | cat
- both foo
and bar
would write into the cat
process).
Additionally, it causes previously working code to "randomly" throw internal exceptions. The random part is likely caused by a race condition, since perl
throws consistently while rev
doesn't cause problems. The exception is as such:
const {spawn} = require(`child_process`);
const p2 = spawn(`perl`, [`-ne`, `print uc`], {
stdio: [`pipe`, process.stdout, process.stderr],
});
const p1 = spawn(`node`, [`-p`, `"hello world"`], {
stdio: [process.stdin, p2.stdin, process.stderr],
});
p1.on(`exit`, code => {
p2.stdin.end();
});
⯠[mael-mbp?] /Users/mael ⯠node test
HELLO WORLD
internal/validators.js:130
throw new ERR_INVALID_ARG_TYPE(name, 'number', value);
^
TypeError [ERR_INVALID_ARG_TYPE]: The "err" argument must be of type number. Received type undefined
at validateNumber (internal/validators.js:130:11)
at Object.getSystemErrorName (util.js:231:3)
at errnoException (internal/errors.js:383:21)
at Socket._final (net.js:371:25)
at callFinal (_stream_writable.js:617:10)
at processTicksAndRejections (internal/process/task_queues.js:81:17)
As you can see the code executed fine (HELLO WORLD
got printed), but during cleanup an internal assertion failed and Node crashed.
@arcanis - thanks for reporting this, I will have a look
I just confirmed that b1f82e4342f8a630b1ef83cd33781a725428f569 (#21209) is NOT the reason for this behavior; with and without that the behavior is observed to be same.
will dig further tomorrow!
Fwiw from my experiments the problems appeared in 11.10 - my tests pass in 11.9. Looking at the diff between the two, the only change in child_process
is #21209.
To me it seems to either be streams or libuv related.
Git bisect between v11.9.0
(good) and v11.10.0
(bad) points to 197efb7f846e14bdb3002f5ff7528d4070880328
197efb7f846e14bdb3002f5ff7528d4070880328 is the first bad commit
commit 197efb7f846e14bdb3002f5ff7528d4070880328
Author: Gireesh Punathil <[email protected]>
Date: Fri Jun 8 08:33:37 2018 -0400
child_process: close pipe ends that are re-piped
when t0 and t1 are spawned with t0's outputstream [1, 2] is piped into
t1's input, a new pipe is created which uses a copy of the t0's fd.
This leaves the original copy in Node parent, unattended. Net result is
that when t0 produces data, it gets bifurcated into both the copies
Detect the passed handle to be of 'wrap' type and close after the
native spawn invocation by which time piping would have been over.
Fixes: https://github.com/nodejs/node/issues/9413
Fixes: https://github.com/nodejs/node/issues/18016
PR-URL: https://github.com/nodejs/node/pull/21209
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
-bash-4.2$ git bisect log
git bisect start
# good: [8e24451439a9abbc1cd99b42066eec65473e781c] 2019-01-30, Version 11.9.0 (Current)
git bisect good 8e24451439a9abbc1cd99b42066eec65473e781c
# bad: [6e56771f2a9707ddf769358a4338224296a6b5fe] 2018-02-14, Version 11.10.0 (Current)
git bisect bad 6e56771f2a9707ddf769358a4338224296a6b5fe
# bad: [2b1f88185fca2bc73ef8f6d09422626e9e237636] benchmark: remove unreachable return
git bisect bad 2b1f88185fca2bc73ef8f6d09422626e9e237636
# good: [e28d891788069732d6e37a00459812bee02910fc] src: fix race condition in `~NodeTraceBuffer`
git bisect good e28d891788069732d6e37a00459812bee02910fc
# bad: [353de0f7520bce50b490f367048ee1e58894a784] doc: fix err_synthetic issue on v11.x
git bisect bad 353de0f7520bce50b490f367048ee1e58894a784
# good: [b5a8376ffe2f380103ebe7f7314d5c7a12500084] src: organize TLSWrap declarations by parent
git bisect good b5a8376ffe2f380103ebe7f7314d5c7a12500084
# good: [1c6fadea3135e5f95f3042cf06cd71de49a33e8f] meta: clarify EoL platform support
git bisect good 1c6fadea3135e5f95f3042cf06cd71de49a33e8f
# good: [2c737a89d59a0c3fa471d36031c64550607fc367] repl: remove obsolete buffer clearing
git bisect good 2c737a89d59a0c3fa471d36031c64550607fc367
# bad: [197efb7f846e14bdb3002f5ff7528d4070880328] child_process: close pipe ends that are re-piped
git bisect bad 197efb7f846e14bdb3002f5ff7528d4070880328
# good: [c866b52942b326674e9ab2adf52467d5f27cdf53] test: remove obsolete code
git bisect good c866b52942b326674e9ab2adf52467d5f27cdf53
# first bad commit: [197efb7f846e14bdb3002f5ff7528d4070880328] child_process: close pipe ends that are re-piped
-bash-4.2$
thanks @richardlau - I dont know what mistake I was doing yesterday, but now I confirm that the behavior difference is caused by 197efb7f846e14bdb3002f5ff7528d4070880328
will analyze the test and and the failure to see what this means.
ok, here is a deadlock situation:
$ cat true.js
const {spawn} = require(`child_process`);
const p3 = spawn('cat', {stdio: ['pipe', process.stdout, process.stderr]});
const p1 = spawn('./foo.sh', {stdio: ['pipe', p3.stdin, process.stderr]});
const p2 = spawn('./bar.sh', {stdio: ['pipe', p3.stdin, process.stderr]});
$ cat false.js
const { spawn } = require("child_process");
let p3 = spawn('wc', ['-l'], {stdio: ['pipe', process.stdout, process.stderr]})
let p2 = spawn('grep', ['spawn'], {stdio: ['pipe', p3.stdin, process.stderr]})
let p1 = spawn('cat', [__filename], {stdio: ['pipe', p2.stdin, process.stderr]})
without 197efb7f846e14bdb3002f5ff7528d4070880328 true.js passes while false.js fails
with 197efb7f846e14bdb3002f5ff7528d4070880328 true.js fails while falsse.js passes
the rationale for 197efb7f846e14bdb3002f5ff7528d4070880328 is that pipe ends when re-piped were loosing data at either of the destination becaue of the bifurcation, making re-piping a meaningless use case. At the same time, looks like that is breaking the accumulation scenario.
I am not sure what is wrong with 197efb7f846e14bdb3002f5ff7528d4070880328 . The piping is good, as the data is correctly delivered (in true.js) but looks like something went wrong that led to the error.
pinging @nodejs/child_process to get an advice, while investigating further.
the rationale for 197efb7 is that pipe ends when re-piped were loosing data at either of the destination becaue of the bifurcation, making re-piping a meaningless use case. At the same time, looks like that is breaking the accumulation scenario.
Have you checked my comment regarding (foo; bar) | cat
? Isn't this change preventing to send the same process' stdin to multiple processes' stdout, since the stdin would get closed after being sent to the first process?
yes, that is exactly true.js
is doing right? (treat foo as 'echo hello' and bar as 'echo world')
if your question is whether the said commit has considered this use case, no, this was not considered.
Ah indeed, I didn't check correctly!
if your question is whether the said commit has considered this use case, no, this was not considered.
Yep no worry I just wanted to be sure it was accounted now š Given that, do you think there's a way to make both the original case and this new one work? Or should 197efb7 be reverted?
If I understand correctly, the original issue (#18016) referenced in option 3 what seemed to be a satisfying behavior: require the pipes to be manually closed by the user. Any improvement can then be implemented in userland via tools like execa
, without compromising child_process
' capabilities.
@arcanis, are you saying that the problem in this comment should be considered satisfying? Which pipes should have been closed to make it work?
I believe the problem in your post is due to an incorrect execution order (I mentioned it in my comment from the other thread but the exact ramifications didn't occur to me until I got to experience the bug first hand). Here is your example with two alternative fixes (I've put comments to detail my thoughts for each of them):
"use strict";
switch (process.argv[2]) {
default: {
console.log("Usage: ./x.js <0|1>");
} break;
case "0": {
// The case that doesn't work; the pipes are left open in the current process, so some
// data can be lost between the time "p1" spawns and the time "p2" spawns
const { spawn } = require("child_process");
let p1 = spawn("./b", ["10000"], {stdio: [process.stdin, "pipe", process.stderr]});
let p2 = spawn("grep", ["7"], {stdio: [p1.stdio[1], "pipe", process.stdout]});
let p3 = spawn("wc", ["-l"], {stdio: [p2.stdio[1], process.stdout, process.stdout]});
} break;
case "1": {
// This is the same thing, except that we now close the pipes on our hand after spawning
// the processes. It seems that calling "p.stdout.end()" doesn't work for some reason
// (it throws a ENOTCONN exception), so we need to close the handle itself (basically
// manually do what 197efb7 automatically does). That makes me think that we're not
// supposed to do this.
const { spawn } = require("child_process");
let p1 = spawn("./b", ["10000"], {stdio: [process.stdin, "pipe", process.stderr]});
let p2 = spawn("grep", ["7"], {stdio: [p1.stdio[1], "pipe", process.stdout]});
p1.stdout._handle.close();
let p3 = spawn("wc", ["-l"], {stdio: [p2.stdio[1], process.stdout, process.stdout]});
p2.stdout._handle.close();
} break;
case "2": {
// This is imo the best solution. Instead of spawning the data producers first, we start
// by spawning the consumers first (right-to-left). After we spawned the process, we
// close our side of the pipe so that it becomes clear that no data is to be expected
// from us. Curiously this time we can use "p.stdin.end()", even though the previous
// example showed that we couldn't call "p.stdout.end()".
//
// This approach also makes it possible to configure multiple subprocesses to write in
// the same pipe, something which isn't possible with left-to-right execution (since you
// don't know what pipe the left process should write into until after you've spawn the
// right process).
const { spawn } = require("child_process");
let p3 = spawn("wc", ["-l"], {stdio: ["pipe", process.stdout, process.stdout]});
let p2 = spawn("grep", ["7"], {stdio: ["pipe", p3.stdin, process.stdout]});
p3.stdin.end();
let p1 = spawn("./b", ["10000"], {stdio: [process.stdin, p2.stdin, process.stderr]});
p2.stdin.end();
} break;
}
Thanks for the code, which I'll use if the commit is reverted, but I
still thing that something is wrong.
First, as I also replied in the other thread, the order should not
matter, and I've had tried it in both directions (but see below).
Second, there is the suspicious fact that the probelm didn't happen on
Windows.
Now, looking at your two solutions I get a vague feeling that the
problem is that a pipe is generated in any case, but when the output of
one process is used as the input of another (regardless of order), then
the connection is done on the underlying FD, and therefore the pipe is
now redundant, and having it around might lead to the race condition
that lead to the problem I've had. But this is all guessing based on
the fact that both of your solutions work via closing the pipe object.
One more comment is that things are still broken (ping
@gireeshpunathil). This variant of the code on Windows (with binaries
that are found on cygwin):
let p3 = spawn("wc.exe", ["-l"], {stdio: ["pipe", process.stdout, process.stdout]});
let p2 = spawn("grep.exe", ["7"], {stdio: ["pipe", p3.stdin, process.stdout]});
// p3.stdin.end();
let p1 = spawn("bash.exe", ["./b", "10000"], {stdio: [process.stdin, p2.stdin, process.stderr]});
// p2.stdin.end();
fails now, and works on linux. It looks like a read is initiated which
causes the failure. Uncommenting the two end
lines makes it work on
windows and fail on linux (probably because they're already closed?).
If my guess is correct, then I see three conclusions:
It is indeed better to stick to starting processes right-to-left, but
for a different reason -- because otherwise there might be some
data that is lost into the pipe's buffer before it is closed.
But the flip side of that is the problem that I've seen above without
calling end
s: if for whatever reason a read is initiated, then
things get broken. (No data is lost, but redundant data can be
read.)
A proper way to resolve this is to avoid creating a pipe in the first
place, as a new kind value to be used in options.stdio. Again, I'm
not sure about all of this since I didn't read the code, but if it's
close to true, then there should be some way to avoid creating a pipe
when one is not needed (when two processes are piped), since
otherwise one of the two problems above is likely.
(Regardless, something should be documented, especially if the commit
is reverted, since this closing thing is not something that I can guess
should be used from the docs.)
My testsuite pass on Windows; the only different thing I'm doing is that I actually call the end
method from within the exit
handler of the producer:
p2.on('exit', () => {
p3.stdin.end();
});
When I tried your example without the exit listener it worked on my machine so I didn't look further, but maybe the listener is required. In any case I agree there's a behavior that should be documented, since it's not clear what works by design and what's an undefined behavior.
My main concern at the moment is that Node 12 is meant to be cut on April 26, and I hope this BC-breaking bug won't be part of it ... otherwise reverting it would become BC-breaking in its own right š
@arcanis - thanks for the detailed explanation!
agreeing that case 2 is the best among all, there are still few concerns here:
end
sp3.stdin
into p2.stdout
, closing p3.stdin
would close the pipe as well?I believe what we are missing here is a specification around piping: specifically answer to these questions:
a | b
use cases? (I guess the answer is no)(a; b) | c
, a | (b; c)
and a | b | c
use cases covered in the original design and expected to behave the way one think they should? (I guess the answer is no)when a child process is created and existing streams passed as inputs, what should be the specified outcome?
I think it depends on whether the pipe
streams are created in a synchronous or asynchronous way. If Node guarantees that the streams will exist when spawn
returns and that the spawned process won't exist before nextTick (so that the calling code has time to bind it to something else), then I guess it can be closed. Otherwise it seems unsafe.
Overall I tend to think that the child_process
overlay tries too much to be magical already, and in returns fails to do both (too magic to be used as low-level interface, not enough to cover portability issues, prompting the existence of cross-spawn
and execa
).
In my perfect world Node would pretty much expose dup2
instead, and let us deal with the intricacy of piping (some magic would still be required to circumvent the lack of fork
and execve
, but that would be an acceptable cost).
should ordering be enforced or recommended in
a | b
use cases? (I guess the answer is no)
I'm not a shell expert (my knowledge mostly come from an old high school project where we had to implement a shell from scratch) so I can't say whether left-to-right execution is used in some of them; what I can say however is that right-to-left is common (and makes more sense to me personally).
is
(a; b) | c
,a | (b; c)
anda | b | c
use cases covered in the original design and expected to behave the way one think they should? (I guess the answer is no)
I hope so! To give some context about my use case: I'm implementing a portable shell for Yarn 2+. It will allow to use a common shell language (posix-inspired) across all systems, fixing one of the major portability pain points we currently have. But that requires pipes to actually work š
(Predlude: I have a lot of experience with shells, and I have implemented much of this functionality for a different language together with the core developer, in a way that works on both unix and windows. As a side note, I ran into this writing some code that makes shell-like functionality available in node.)
@gireeshpunathil: I agree with your points, and specifically
it is not intuitive - one could think [...]
is a particular problem. Not only is it unintuitive, it's also hard to explain, and I believe that as such it is not a good functionality.
when a child process is created and existing streams passed as inputs, what should be the specified outcome?
I also agree that too much magic is problematic, and exactly because of that I suggested adding an option that avoids creating a pipe altogether. It's not as low-level as exposing any form of FD duplication (which IMO would be an overkill and an over-complication), but it's low-level enough to ensure that no surprises happen and to allow cross-platform code. Specifically, any explanation that talks about doing stuff (starting processes, closing pipes) in the same tick or not are fundamentally complicated since there will be cases where people won't be able to guarantee that, or will be surprised when things break.
what is the specified behavior when streams are piped post creation?
The original problem that surprised me is that I ended with code where two things were competing for data, leading to loss. Any such behavior is extremely dangerous and I strongly believe that it should not be allowed to happen. (I.e., fixing the documentation would be necessary, but leaving the ability for people to shoot their foot is very bad.) Again, I believe that a new mode that ensures that no pipe is created is therefore needed -- an alternative would be some special treatment once a stream is connected to a process which leads to the above problems.
should ordering be enforced or recommended
I very strongly believe that a design that requires an order is similarly flawed. I have never heard of any such requirement, and I don't think that there's any "popular" choice. Searching the web brings up nothing, so I resorted to just trying it out:
ps | sort -n | grep -wE '(ps|sort|grep|sed)' | sed -e 's/^ *[0-9][0-9]* *//'
I tried that using bash
, zsh
, dash
, and sh
on both a Fedora and an Ubuntu, and ran the test 1000 times each: the results show that in all of these cases the processes are started left to right.
(Again, see my earlier explanations: it is better now to start processes right-to-left to avoid potential data loss, only because of the design, and because it is generally more dangerous than redundant reading. A good design would not have any potential loss anyway, leaving things to the OS which deals fine with blocking a process on either side when needed.)
is
(a; b) | c
,a | (b; c)
anda | b | c
use cases covered in the original design and expected to behave the way one think they should? (I guess the answer is no)
It was certainly surprising to be (even in the a | b
case). (a; b) | c
is only different in re-using the same stream connection for two processes, and I would expect interleaved data unless you start b
only after a
is done. IOW, I believe that it is useful to connect two different processes to the same I/O stream and then it is on me to run them in sequence if needed.
@arcanis , @elibarzilay - thanks for the detailed analysis and explanation! I guess all are valid points.
At this point I want to restate this - guess we all know, but just restating for check-pointing this discussion:
internal piping is absolutely necessary, and a well-thought design decision. For node to interact with singular child processes, in an asynchronous manner. Without internal pipes, data flow cannot be managed in any direction.
when siblings (node's children) want to communicate each other directly, the interplay between internal pipes and external pipes (or pre-created streams or end-points or whatever we call those) are not well understood; nor well specified and documented.
Hope you all are with me up to this point?
Right now the internal pipes are created un-conditionally in a much earlier phase in spawn
, and re-wired when we encounter external pipes in the option
set. I guess that is where the issue is.
Pipe creations that consider the option set at the process creation site would solve all the said issues IMO; though not sure how much effort / design changes that would be required.
I will study the code around a bit more before coming up with anything concrete; feel free to share opinions!
@gireeshpunathil: yes, I agree with that, including the point about internal piping needed.
The reason I mentioned the other language I worked on is that it is a very similar situation, where you need some piping to interact with the stream from the language. So in addition to plain in-language streams, there are "primitive streams" that can be used as an optimization when you want to spawn to subprocesses and let them talk directly to each other -- and the choice there is to explicitly ask for a pipe whenever you need to interact with the process from the language.
It sounds to me that node is doing the pipe regardless, and the problems could be resolved if you always use .pipe
-- but that comes at a cost of the parent process doing a lot of redundant work shuffling the data around instead of letting the kernel do it. For this reason I suggested adding an option that would lead to a similar situation, where the stdio
will end up with a value that can be used with another process.
If this is done well, then there would also be some facility to wrap such values in a way that creates a pipe that makes it possible to talk to the process (I'm using "pipe" in the OS sense here, not .pipe
), and with that the existing "pipe"
option would be viewed as a convenience that does that automatically.
What's the plan going forward, especially regarding Node 12?
So no revert is planned, even if this diff clearly breaks existing usages? š
I'm all for a better option, but I'm concerned it will take a lot of time. In the meantime the bug will stay, will start to be expected by some users, and changing this will be even more painful than it currently.
@arcanis - thanks. I haven't made up my mind on reverting the change. As stated earlier, there is one class of issues this fix resolves, while it looks like introduces another class, and these two types being mutually exclusive.
As I haven't heard any opinions from @nodejs/child_process , I will add it to the tsc-agenda to get things moving.
Sounds good - to explain a bit why I'm pushing for a revert, when faced with two equally imperfect options I believe preserving the status quo should prime.
@arcanis - thanks. I haven't made up my mind on reverting the change. As stated earlier, there is one class of issues this fix resolves, while it looks like introduces another class, and these two types being mutually exclusive.
As I haven't heard any opinions from @nodejs/child_process , I will add it to the tsc-agenda to get things moving.
/ping @nodejs/tsc
Well, it wasn't April 26 after all ... š
https://medium.com/@nodejs/introducing-node-js-12-76c41a1b3f3f
Given some other parts of core, I think all stdio should never be closed _at all_. Otherwise console.log
and other things would stop working.
@gireeshpunathil @addaleax what do you think?
I was out of office for a while, will try to come up with something solid by tomorrow.
Given some other parts of core, I think all stdio should never be closed _at all_. Otherwise
console.log
and other things would stop working.
@mcollina This issue is about handling child process stdio streams in the parent process, so I think thatās an unrelated concern.
@arcanis I think the lack of movement here is partially due to the issue being very specific, and the length of the existing discussion here ā I have a hard time catching up with everything, tbh, although the issue in itself seems limited in scope. It sounds like the gist of the issue fixed by 197efb7f846e14bdb3002f5ff7528d4070880328 is that the Node.js parent process might attempt to read data from an end to the same pipe as the child process received, which is obviously bad.
It seems like closing the stream, is, however, not a good solution, because that prevents us from letting other child processes also write to that stream. Intuitively, what Iād suggest as an alternative to 197efb7f846e14bdb3002f5ff7528d4070880328, is to disable reading from the pipe in the parent process either completely or until reading is re-initiated explicitly. But that also seems like itās a very simple solution, and a simple solution just seems inherently unlikely after 15 screen pages of discussion and multiple previous issues? (Iāll try it out and report back, just in case.)
(The error shown in the original issue comment here seems to stem from the fact that the script is trying to shutdown the writable side of an already-closed stream, and that the native method that does that returns undefined
when the associated C++ object no longer exists, which the net
code is not prepared to handle currently. Thatās fixable, although it makes it seem a lot like 197efb7f846e14bdb3002f5ff7528d4070880328 is incomplete in that it should also have destroyed the pipeās stream
state, in additional to the underlying resource ā but thatās assuming that destroying the underlying resource is actually the way to go.)
@elibarzilayās suggestion of a second kind of way to represent os-level streams also sounds reasonable in a way, but if thatās what we want to do, itās also something that we should have done from the beginning and doesnāt seem like a answer to this problem in the sense that everyone, for now, is stuck with the current system.
@arcanis Since you sound (rightfully) concerned about the process around this: If you donāt see a real solution to an issue coming up in an overseeable and acceptable time frame, I think you can generally feel free to open a PR with a revert yourself or ask somebody else to, since this seems like a real regression and thatās what we usually do for regressions. The Node.js 12 release shouldnāt be a huge issue when it comes to a revert, weāve landed reverts of breaking changes in the next-release-after-a-semver-major release before.
It looks like the āsimpleā solution, i.e. just stopping the readable side, might just work? Iāve opened https://github.com/nodejs/node/pull/27373 with that.
@addaleax - thanks. Looking back, closing the stream was not the optimal solution, and I agree with your fix as comprehensive, and addressing both the use cases without side effects.
just wondering about the tsc-agenda
label - is it necessary anymore? I added it for 2 reasons: i) I wasn't able to come to a conclusion or a way to progress, ii) node 12 was looming and thought it is prudent to get this resolved. Now neither of these reasons are relevant anymore (we have a direction now, and node 12 is out already and we could target this for 12.0.1
or so), I am removing it, feel free to add it back if anyone thinks otherwise!
I updated to 12.3.1 from node 10 and my code which detects stdin 'end' event on child process stopped working. is this related?
@ninja- - can't say for sure, do you have a simple test case that shows up the issue?
Most helpful comment
@ninja- - can't say for sure, do you have a simple test case that shows up the issue?