Node: Cannot properly close/cleanup character device ReadStream (process hangs)

Created on 16 Sep 2017  路  9Comments  路  Source: nodejs/node

Issue observed on two distinct systems:

  • System 1. Linux laptop

    • Version: 8.4.0
    • Platform: Arch Linux 4.11.9 (x86_64)
  • System 2. Beaglebone Black

    • Version: 6.10.3
    • Platform: Linux 4.9.41 (armv71)

I'm using fs.ReadStream to continuously read from a _character device_, e.g. /dev/input/event0. Receiving events works great, but on attempting to shut down the process does not exit as expected. Googling has turned up similar-sounding issues all the way back to node-0.x, but I have been unable to find a solution.

Code to reproduce (behaves the same on both aforementioned systems):

const fs = require('fs')
const stream = fs.createReadStream('/dev/input/event0')

stream
  .on('end', () => { console.log('End') })
  .on('open', fd => { console.log(`Opened (fd=${fd})`) })
  .on('close', () => { console.log('Closed') })
  .on('error', err => { console.log(`Error: ${err}`) })
  .on('data', data => { console.log(data) })

setTimeout(() => {
  console.log('Attempting to close stream')

  stream.on('close', () => {
    console.log(`Stream fd is now ${stream.fd}`)
    process.exit(0) // Hangs
  })

  stream.close()
}, 1000)

Output:

Opened (fd=9)
Attempting to close stream
Closed
Stream fd is now null
(Hung here, ^C or kill process to exit)

A few interesting (?) observations:

Explicitly obtaining the file descriptor we get identical results, _unless_ if the .on('data', ...) listener is _not_ attached, in which case we can exit as expected. Removing the listener in the shutdown process does not work, however, nor do we get this behavior if the fd is opened by fs.createReadStream (i.e. simply commenting out the .on('data', ...) from the first sample. Adjusted sample:

const fs = require('fs')

const fd = fs.openSync('/dev/input/event0', 'r')
const stream = fs.createReadStream('ignored', { fd: fd })

stream
  .on('end', () => { console.log('End') })
  //.on('open', fd => { console.log(`Opened (fd=${fd})`) }) // No event, as expected
  .on('close', () => { console.log('Closed') })
  .on('error', err => { console.log(`Error: ${err}`) })
  //.on('data', data => { console.log(data) }) // Exit works if commented out

setTimeout(() => {
  console.log('Attempting to close stream')

  //stream.removeAllListeners('data') // No result

  stream.on('close', () => {
    console.log(`Stream fd is now ${stream.fd}`)
    process.exit(0)
  })

  stream.close()
  //fs.closeSync(fd) // Does not trigger the `close` event
}, 1000)

This is probably not related to the problem, but I get the same result if I "trigger" an end-event by doing the following in my shutdown routine as such:

// `.push(null)` and `.read(0)` together trigger an `end` event
stream.push(null)
stream.read(0)

Is this a bug, or am I simply missing something? Cheers!

doc fs good first issue mentor-available

Most helpful comment

This is not something libuv would (or could) fix but documenting it is a good way forward. Labels updated.

Aside: fs.createReadStream() is the wrong tool for this. Unless it's a /dev device that doesn't support polling you should use new net.Socket({ fd: ... }).

All 9 comments

Some additional information in case somebody else is having similar problems.

I'm fairly confident that this happens because the data event handler remains in the node event loop (even after calling .destroy() or even explicitly unsetting all listeners). I'm by no means an expert on internals though, so please take that statement with a grain of salt.

Output from a simple test using Why Is Node Running?:

There are 3 handle(s) keeping the process running

# FSREQWRAP
(unknown stack trace)

# FSREQWRAP
/path/to/test.js:27 - stream.destroy()

# TickObject
(unknown stack trace)

Potential workaround

If an event is sent to the stream _after_ close/destroy has been called, the process will end successfully without any errors. Thus, if you are able to simultaneously read and write from/to your stream source, you could ensure shutdown by artificially injecting some event in the .on('close')-handler. Might be a workaround for some.

A simple way to test the above is to use your keyboard or mouse as an input source, e.g. /dev/input/by-id/usb-my-mouse:

Opened (fd=9)
Attempting to close stream
Closed
Stream fd is now null
(Process hangs here)
(Move mouse)
(Process exits with code=0)

unable to reproduce this in macos, and with some other character devices in alpine linux too so looks like platform specific. Unable to access the said device in my Linux box, as it looks like that needs root permission.

Can you run with strace -f and see the last few lines? that will give some clues as to why the process hangs.

Certainly. Here's a log as requested, starting just where we do the stream.on('close') logging:

[pid   423] write(9, "Stream fd is now null\n", 22Stream fd is now null) = 22
[pid   423] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   432] <... futex resumed> )       = 0
[pid   423] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] futex(0x21d16c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid   432] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1) = 1
[pid   429] <... futex resumed> )       = 0
[pid   423] <... futex resumed> )       = 0
[pid   432] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   423] futex(0x7f3e456ce9d0, FUTEX_WAIT, 429, NULL <unfinished ...>
[pid   432] <... futex resumed> )       = 0
[pid   429] futex(0x21d172c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] madvise(0x7f3e36ffe000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   431] <... futex resumed> )       = 0
[pid   429] <... futex resumed> )       = 1
[pid   432] <... madvise resumed> )     = 0
[pid   431] futex(0x21d16c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   432] exit(0 <unfinished ...>
[pid   431] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid   432] <... exit resumed>)         = ?
[pid   431] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   429] futex(0x21d16c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   432] +++ exited with 0 +++
[pid   431] <... futex resumed> )       = 0
[pid   429] <... futex resumed> )       = 0
[pid   431] madvise(0x7f3e377ff000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   429] madvise(0x7f3e44ece000, 8368128, MADV_DONTNEED <unfinished ...>
[pid   431] <... madvise resumed> )     = 0
[pid   429] <... madvise resumed> )     = 0
[pid   431] exit(0 <unfinished ...>
[pid   429] exit(0 <unfinished ...>
[pid   431] <... exit resumed>)         = ?
[pid   429] <... exit resumed>)         = ?
[pid   431] +++ exited with 0 +++
[pid   429] +++ exited with 0 +++
[pid   423] <... futex resumed> )       = 0
[pid   423] futex(0x7f3e44ecd9d0, FUTEX_WAIT, 430, NULL

// Point where it hangs until ^C
^C) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
strace: Process n detached (7 processes detatched)

Thanks for looking into this!

thanks @lfk for the data, and sorry! actually I missed your updates among the github notification flood.

minimal reproduce:

$ cat 15439.js

const fs = require('fs')
const stream = fs.createReadStream('/dev/input/event0')
setTimeout(() => {
  stream.close()
}, process.argv[2] / 1)

I have tried with /dev/random , as I don't have access to the said device in my ststem.
with /dev/random, the code comes out after random intervals (apparently when the device reeives some random data at random intervals)

My theory: As the file system work is carried out by internal worker thread, and it has initiated a blocking read from the device, and the data won't come, the main thread is unable to move ahead - it is held up in the polling for an async event that has to come from the said worker.

If I adjust (reduce) the timeout I observe race condition - process exits sometimes, else hangs.

I believe the race is between the read thread and the close thread. If the close happens first, read is abandoned. If the read is initiated first, nothing else matters.

/cc @nodejs/fs @nodejs/libuv

A C reproduce, I wasn't successful with a libuv recreate.

$ cat 15439.c

#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>

int fd;

void *worker(void *data) {
  char buf[65536];
  read(fd, buf, 65536);
  read(fd, buf, 65536);
  return NULL;
}

int main() {
  pthread_t tid;
  fd = open("/dev/random", O_RDONLY);
  pthread_create(&tid, NULL, worker, NULL);
  sleep(1);
  close(fd);
  pthread_join(tid, NULL);
  return 0;
}
  • the multiple reads is to accommodate the flaky-ness of /dev/random. If you replace it with /dev/input/event0 then only one read is required.
  • /dev/input/event0 is operable only by root.
  • sleep(1) holds the key. If you adjust that, race will be seen.

The behavior is obvious, as the reader thread has entered the kernel and is oblivious to the closing of fd in main thread.

  • tcp streams do not have this issue as they are always non blocking.
  • regular fs streams do not have this issue as the disk read would always yield something, though with a latency that is addressed by internal worker threads.
  • this is a hybrid stream that uses a blocking device underneath but want to expose itself as an asynchronous stream abstraction.

Fixing this at its core will be bit involving, as it would need detecting such cases and interrupting worker threads.

Given that it is a corner case, I recommend to document this as a limitation.

This is not something libuv would (or could) fix but documenting it is a good way forward. Labels updated.

Aside: fs.createReadStream() is the wrong tool for this. Unless it's a /dev device that doesn't support polling you should use new net.Socket({ fd: ... }).

Thank you again @gireeshpunathil, really appreciate that you took the time to look into this and provide such a detailed breakdown.

Just for the sake of completeness, in my case the net.Socket({ fd: ... }) approach doesn't work, it yields a TypeError: Unsupported fd type: FILE. But as mentioned earlier, I've got a viable workaround.

@lfk - sure. Do you want to attempt a documentation change to this effect and make it a PR? I guess you can best explain the limitation being the user who experienced it. Let me know!

The workaround in the docs did not help me, but I managed to get it to work with:

this.stream = spawn("cat", [character_device_path]).stdout;
Was this page helpful?
0 / 5 - 0 ratings

Related issues

Icemic picture Icemic  路  3Comments

dfahlander picture dfahlander  路  3Comments

cong88 picture cong88  路  3Comments

Brekmister picture Brekmister  路  3Comments

danielstaleiny picture danielstaleiny  路  3Comments