Node: 'close' event for spawn() isn't emitted under unknown conditions

Created on 5 Jun 2016  Â·  21Comments  Â·  Source: nodejs/node

  • Version: v4.4.5
  • Platform: OS X 10.11.5 (15F34) [Darwin 15.5.0 Darwin Kernel Version 15.5.0: Tue Apr 19 18:36:36 PDT 2016; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64]
  • Subsystem: child_process

I wasn't able to create a code example for this issue, but is happens consistently in our integration tests. We spawn a python process (2.7.10) with binary encoding - eventually we get stderr 'end' & process 'exit' events, but no stdout 'end' or process 'close' events - it seems like it's specific to some of the tests, but not for all. I tested on v4.4.4 (and some other earlier versions) with no similar issues at all - so you might be able to pin point the issue without a proper code example.

a sketch of the code

const process = spawn(pythonPath, command, { encoding: 'binary' } );
process
.on('exit', (code) => { console.log('exit') })
.on('close', (code) => { console.log('close') }) // for some cases this event doesn't arrive
child_process stream

Most helpful comment

@addaleax thanks for all the insight... will keep you in the loop about what the decision is

All 21 comments

There is nothing in the diff between v4.4.4 and v4.4.5 that stands out. I don't think we'll be able to do much without a test case.

@bnoordhuis - I think it might be connected to gunzip, the following test case demonstrates the issue (at least on my machine)

example.sh:

#!/bin/bash
cat ./example.gz

example.gz

javascript:

const spawn = require("child_process").spawn;
const zlib = require('zlib');
const child = spawn("./example.sh")
child.on('exit', () => {
  console.log("exit");
}).on('close', () => {
  console.log("close");
})
const gunzip = zlib.createGunzip();
gunzip.on('end', () => {
  console.log("gunzip ended");
})
child.stdout.pipe(gunzip).on('data', function() {})

output on 4.4.5 (on my machine)


on 4.4.4:

close
gunzip ended

48684af is the only zlib-related change in v4.4.5, and even though it touches the zlib close() logic, it does not seem like it could have this effect. I also can’t reproduce any change in behaviour from v4.4.4 to v4.4.5 using the above example on Linux.

There’s also my 51c0808b55, which touches the piping login for streams, but I wouldn’t say that that’s a likely candidate here, either.

@iamdoron Does anything change from Node v4.1.2 to v4.2.0 for you?

I was able to reproduce it on ubuntu Linux 3.13.0-71-generic #114-Ubuntu SMP Tue Dec 1 02:34:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

just gunzip the example.gz file and append it to a new file until you have ~96MB of unzipped file (maybe you'll need more, I guess it depends on the machine), then gzip it and try running the example again

@addaleax I plan to take a look into this on Monday. Taking advantage of githubs new ability to assign multiple people!

Feel free to unassign

Can confirm that 51c0808 is actually responsible for this… sigh.

/cc @nodejs/streams I guess

Meta data from 51c0808

Fixes: #5820
Fixes: #5257
PR-URL: #6023

Sufficiently large example.gz for reproduction: example.gz

PR: #7160

We have had to peg to 4.4.4 because of a issue like this. In our case we are piping from sox through ffmpeg both of which we are spawn()ing. We get the exit event on sox but not for ffmpeg.

@TheAlphaNerd Is there a reasonable chance the fix will make it into v4.5.0?

For future reference, bisecting using @petrosagg’s test says that this specific problem with child processes was introduced in 34b535f4caba366789d949211e4369b1e1d01152 (#4071).

@addaleax #7160 fixes @petrosagg’s problem too?

@TheAlphaNerd Yes, I think we pretty much worked out that it is a duplicate of this one here.

so this was not introduced in v4.4.5?

Yes and no – It was introduced in v4.2.5, but had no chance to affect anything because it was covered by another regression that was present from v4.2.0 to v4.4.5 (The one fixed by #6023 and reported in #5820/#5257, as you mentioned above).

Ok so... the regression that #6023 fixed made this other regression possible? TLDR we need the new fix and everything should be "better"

Yes, exactly. If you should decide that you want to see live #7160 for a longer time in v6, then it’s probably better to revert the #6023 fix temporarily because I’d consider this bug more serious… I’m pretty sure that #7160 is safe to land in v4 and won’t introduce any more regressions, but I get that there are reasons for the rules around this process.

@addaleax thanks for all the insight... will keep you in the loop about what the decision is

Was this page helpful?
0 / 5 - 0 ratings