Node: Race condition in shutdown hook (process.on('exit'))

Created on 2 Oct 2017  路  17Comments  路  Source: nodejs/node

On Node.js versions 6 and 8 (I assume 7 acts the same way)

I have a process.on('exit') hook like so:

process.once('exit', function (code: number) {

  fs.appendFileSync(file,'data');
  process.exit(code);

});

I have a handful of calls to fs.appendFileSync in this shutdown hook.

4/5 times, there is no delay, nothing...1/5 of the time, there is a huge delay, of about 2-3 seconds. So there is some form of race condition going on. I don't think it's my code. If I comment out all the fs.appendFileSync calls (there are about 5), then the race condition doesn't seem to occur, as frequently.

Does this sound fishy? idk

process question

Most helpful comment

Perhaps it is possible that the libuv file threads are already occupied and so the sync file write blocks?

fs.appendFileSync() doesn't use the thread pool.

All 17 comments

Did you mean to post this to the help repo? As a bug report it's lacking.

yeah it's one of those things, I don't have an error trace, just seems to be a race condition, I am fairly certain it's not my code.

Any way to reproduce? If not, there's not much we can do.

yeah let me try to get a repro for you tomorrow, if I can't, I will close tomorrow

Perhaps it is possible that the libuv file threads are already occupied and so the sync file write blocks?

@Fishrock123 sounds reasonable, but the 2-3 seconds wait is what is really weird. Everything is sync though when it's blocking for 2-3 seconds, so that's what's so weird. It's blocking after reaching the process.on('exit') handler, so it must be sync.

No, it doesn't have to be caused by a sync operation.

This is a bit hypothetical, but if you are running large async file operations (that are large at the libuv level), they can block other operations that may run on different event loop turns. This because libuv has to emulate some async-ness on file operations by using a threadpool.

I have no evidence to say this is what is happening, but it is possible the threadpool is already backed up with async operations when the exit handler is invoked, thus backing up the sync fs call still.

(I think my understanding of libuv is correct about this.)

_Aside: there is no point in doing process.exit(code); from an exit handler. You can manually set process.exitCode if you need to change it._

@Fishrock123 thanks, I will set process.exitCode manually, because it is a little weird to call process.exit(code) within an exit handler, even though it does work. I think you might be right about the libuv threadpool, I am going to do more investigation.

Perhaps it is possible that the libuv file threads are already occupied and so the sync file write blocks?

fs.appendFileSync() doesn't use the thread pool.

Alright I am going to get you a minimal reproducible example, or I will send you the actual code if I can't get the former today.

Is there a way t debug v8/libuv from JS to see what might be going on after process.exit is called?

Your best bet may be to set a breakpoint in the exit handler and run --inspect connected to chrome devtools.

Ok so I have a hypothesis. This whole thing is happening only in the context of a parent process-child process relationship, and the child Node.js process is the one that is not shutting down in a timely manner. To reiterate, about 1/3rd of the time, the child is taking 2-3 seconds to shutdown after reaching the process.exit() handler, and I know this via logging statements. In the process.exit() handler it's only issuing fs.appendFileSync calls, nothing that should be crazy-slow like network calls.

So here is what I am guessing - I am streaming stdout/stderr to parent from child process. In the child process' process.exit() hook, when I call a sync I/O routine (fs.appendFileSync), it seems to interrupt the stdout/stderr that is getting sent to the parent process in a weird way that prevents the child process from shutting down in a proper fashion.

Ok so I did a test, when I launch the child process and use this for stdio:

stdio: [ 'ignore', 'ignore' 'ignore' ],

the problem / race condition still persists, with no difference in behavior.

I have an Ubuntu machine which is significantly faster than my Mac...the problem does not seem to happen on the Ubuntu machine, if that's any hint at all.

@ORESoftware Any news? If not, I move to close.

@bnoordhuis yes please close, the problem is still occurring, but I will reopen if I can find a culprit other than my own code, thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

willnwhite picture willnwhite  路  3Comments

filipesilvaa picture filipesilvaa  路  3Comments

cong88 picture cong88  路  3Comments

dfahlander picture dfahlander  路  3Comments

stevenvachon picture stevenvachon  路  3Comments