Node: fs readdir with buffer and file types problem

Created on 11 May 2020  Â·  9Comments  Â·  Source: nodejs/node

  • Version: 12.16.3
  • Platform: linux64
  • Subsystem: gentoo

What steps will reproduce the bug?


reading directory with files names using buffer as path does not work.

const fs = require('fs');
fs.readdir( Buffer.from( "."),{withFileTypes:true,encoding:"buffer"},(e,d)=>console.log("dir",d));

It works correct in version 12.14.0 but after upgrade it stopped to work.
using it without files types works:
// fs.readdir( Buffer.from( "."),{withFileTypes:false,encoding:"buffer"},(e,d)=>console.log("dir",d));

How often does it reproduce? Is there a required condition?

every time

What is the expected behavior?

get directory entries with file types i.e. array of Dirent objects

What do you see instead?

error message:

Uncaught:
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received an instance of Buffer
    at validateString (internal/validators.js:117:11)
    at Object.join (path.js:1039:7)
    at getDirents (internal/fs/utils.js:159:39)
    at FSReqCallback.req.oncomplete (fs.js:858:7) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Additional information

This bug is degradation because it doesn't exists in version 12.14.0 and present in versions 12.16.1 and 12.16.3 at least.
I use buffer instead of string because working with my old archives with non utf-8 names.

confirmed-bug fs good first issue mentor-available

Most helpful comment

Well, as I mentioned, it seems like a real bug for sure – but it would have been nice to have a reproduction available as a basis for a regression test. :+1:

Basically, the issue is that this line:

https://github.com/nodejs/node/blob/8a6fab02adab2de05f6e864847f96b0924be0840/lib/internal/fs/utils.js#L188

uses path.join, which can only handle strings, but path and/or name can be Buffer instances as well (so 4 combinations in total – string, string + Buffer, string + string, Buffer + Buffer, Buffer). This happens a few times in that file, all of which probably need to be updated – it might make sense to put that in a separate join function that takes any of the 4 argument combinations.

@vasyanewlifehere If you like, you are welcome to work on a PR?

All 9 comments

This seems like a real bug, although it would be helpful to know what kind of directory entries there are in your directory since that will determine whether an error occurs or not.

Sorry.
After you comment I've checked once again.
It doesn't work just on my UDF blue ray mounted file systems.
It work correct in other cases.
than it is possible not a degradation.
but in any case the code throws "uncaught" exception and my program fails.

Well, as I mentioned, it seems like a real bug for sure – but it would have been nice to have a reproduction available as a basis for a regression test. :+1:

Basically, the issue is that this line:

https://github.com/nodejs/node/blob/8a6fab02adab2de05f6e864847f96b0924be0840/lib/internal/fs/utils.js#L188

uses path.join, which can only handle strings, but path and/or name can be Buffer instances as well (so 4 combinations in total – string, string + Buffer, string + string, Buffer + Buffer, Buffer). This happens a few times in that file, all of which probably need to be updated – it might make sense to put that in a separate join function that takes any of the 4 argument combinations.

@vasyanewlifehere If you like, you are welcome to work on a PR?

I could help with this issue.

(UPD): will start working on it tommorow if there will be no answer from @vasyanewlifehere

@vasyanewlifehere are you planning to work on PR?

it would have been nice to have a reproduction available as a basis for a regression test

The following lines of code work if a dirent type is UV_DIRENT_UNKNOWN (which is crossplatform equivalent of DT_UNKNOWN).
https://github.com/nodejs/node/blob/8a6fab02adab2de05f6e864847f96b0924be0840/lib/internal/fs/utils.js#L188

https://github.com/nodejs/node/blob/8a6fab02adab2de05f6e864847f96b0924be0840/lib/internal/fs/utils.js#L217

As to create a reproduction we need to force OS to return DT_UNKNOWN. I spent an hour on researching, and still don't know how to achive this on macOS.

Currently, only some filesystems (among them: Btrfs, ext2, ext3, and ext4) have full support for returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN.

The paragraph above is from this manpage. It got me thinking, that for reporoducing the bug on CI we will need to mount such a FS that will cause OS to return UV_DIRENT_UNKNOWN.

Alternatively, according to this and that, we could find such a system where DT_UNKNOWN if undefined, so libuv will always return UV_DIRENT_UNKNOWN

All that sounds really tough). @addaleax Are there other ways to force OS to return DT_UNKNOWN?

Instead of having a reproduction, we could create unit tests for getDirents and getDirent. In the tests we might mock OS calls.

In code it will look something like

function getDirent(path, name, type, callback, loadFs = lazyLoadFs) {
  if (typeof callback === 'function') {
    if (type === UV_DIRENT_UNKNOWN) {
      loadFs().lstat(pathModule.join(path, name), (err, stats) => {
      // rest of the code
}

Tests

   function callback(err) {
      assert.strictlyEqual(err, null);
   }
   function lazyLoadFsMock() {
      // return something
   }
   getDirent(Buffer('.'), 'foo', 'bar', callback, lazyLoadFsMock)

UPD: I created a reproduction on macOS. I definetely won't work on Windows. The test doesn't fail on Ubuntu.

'use strict';
const tmpdir = require('../common/tmpdir');
const fs = require('fs');
const common = require('../common');
const assert = require('assert');

fs.readdir(
    Buffer.from("/dev"),
    {withFileTypes: true, encoding: "buffer"},
    common.mustCall((error,d) => {
        assert.strictEqual(error, null)
    })
)

@addaleax will it be OK if add unit tests only?

@addaleax Can we assume that name and path args are correct and buffers are in 'utf-8' when we get string + Buffer or Buffer + string or Buffer + Buffer combination?
If yes, then the join function will look like that

path: string,               -> path
path: string, name: Buffer  -> Buffer.concat([Buffer.from(path + path.sep)), name])
path: string, name: string  -> path.join(string, string)
path: Buffer, name: Buffer  -> Buffet.concat([path,  Buffer.from(path.sep), name])
path: Buffer, name: string  -> Buffer.concat([path, Buffer.from(path.sep + name)])
path: Buffer                -> path

If not, it seems that we need to implement something similar to path.join for buffers.

> path.join('123/////', '////123')
'123/123'
> path.join('123/////', '////123/////')
'123/123/'

Considereing the fact that a buffer may contain data in different encodings, the algorythm will be quite complex. Also, it seems to me that we can't stringify buffers and use path.join since we don't know a Buffer's encoding.

I'd like to work on this issue. Has it been fixed?

@kkz13250 It's not been fixed. I started to work on this issue and created a reproduction here.

I think I know how to do it. We could use options.encoding field as to perform convertions from Buffer to string and vice versa. will continue working on PR ASAP.

@shackjjj Thanks for the ping – no, unfortunately we can’t assume that Buffers contain UTF-8, otherwise we wouldn’t need to have a Buffer variant in the fs API. We can, however, assume that / is the path separator, so joining two Buffer paths in this case here boils down to Buffer.concat([buf1, slashBuffer, buf2]).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

loretoparisi picture loretoparisi  Â·  3Comments

mcollina picture mcollina  Â·  3Comments

cong88 picture cong88  Â·  3Comments

Icemic picture Icemic  Â·  3Comments

dfahlander picture dfahlander  Â·  3Comments