Node: path.join('C:') = 'C:.' only in windows node LTS version

Created on 21 Jul 2017  ·  20Comments  ·  Source: nodejs/node

  • 6.11.1:
  • Window 10 x64:

In windows, path.join('C:') return 'C:.' not 'C:'
In other system, such as linux,macos always return 'C:'

In windows, 'path.join('C:\')' is corrrect, return 'C:\'

But you give path.join more than one args, it's correct.
such as path.join('C:', 'a') == 'C:\a'

here is a interesting thing, if you run path.format(path.parse(path.join('C:'))) in windows,
returns 'C:C:.'

I don't know why...

path windows

Most helpful comment

@tniessen It might be helpful to look at what other platforms are doing. Python 3.5.3 does:

>>> import ntpath
>>> ntpath.normpath('C:')
'C:'
>>> ntpath.normpath('C:.')
'C:'
>>> ntpath.normpath('C:\\')
'C:\\'
>>> ntpath.normpath('C:\\abc')
'C:\\abc'
>>> ntpath.normpath('C:abc')
'C:abc'
>>>
>>> ntpath.join('C:')
'C:'
>>> ntpath.join('C:\\')
'C:\\'
>>> ntpath.join('C:', '.')
'C:.'
>>> ntpath.join('C:', 'abc')
'C:abc'
>>> ntpath.join('C:\\', 'abc')
'C:\\abc'
>>> ntpath.join('C:\\abc', '.')
'C:\\abc\\.'

From what I'm reading in this thread, the C:. -> C: normalization should be expected and correct, and it is IMO what we should do as well. On the other hand, Python's ntpath.join doesn't normalize the path after returning, so joining 'C:' and '.' returns 'C:.'.

About parse, I guess now I would still expect C:. to parse to

    { root: 'C:', dir: '', base: '.', ext: '', name: '.' }
not { root: 'C:', dir: '', base: 'C:.', ext: '', name: 'C:.' } (current)

The root is C: alright, while the actual path is dir + path.sep + name so ..

By the same logic C:abc should be

    { root: 'C:', dir: '', base: 'abc', ext: '', name: 'abc' }
not { root: 'C:', dir: '', base: 'C:abc', ext: '', name: 'C:abc' } (current)

All 20 comments

In general, the path normalization behavior is platform-dependent. You might find path.win32.* and path.posix.* to be helpful for testing on different platforms.

Now, the issue itself is actually two issues as far as I can tell.

The path.win32.join('C:') === 'C:.' issue is actually coming from path.win32.normalize('C:') === 'C:.'. It should normalize to 'C:\\' IMO.

The join->parse->format issue is more complex:

> path.win32.parse('C:')
{ root: 'C:', dir: 'C:', base: '', ext: '', name: '' }
> path.win32.parse('C:.')
{ root: 'C:', dir: '', base: 'C:.', ext: '', name: 'C:.' } // <-- ISSUE

I would expect 'C:.' to parse to

{ root: 'C:', dir: 'C:', base: '.', ext: '', name: '.' }

Similarly:

> path.win32.parse('C:abcd')
{ root: 'C:', dir: '', base: 'C:abcd', ext: '', name: 'C:abcd' }

seems to be faulty as well.

These issues are also reproducible on master.

Well, what can I say... Paths on Windows are a mess, and node really isn't dealing with that mess too well yet.

Okay, the colon has three separate meanings under Windows, and it is not trivial to dinstinguish them:

  1. The colon can be used after a drive letter and before a backslash as part of a so called "disk designator", resulting in an absolute path:
    cmd C:\>echo A > F:\test\foo C:\>type F:\test\foo A
  2. The colon can be used after a drive letter but without a following backslash. Windows will then resolve the path relative to the current working directory on that drive:
C:\>cd F:\test
C:\>echo foo > F:something
C:\>type F:\test\something
foo
  1. The colon can also be used to access NTFS alternate data streams, but only under certain circumstances:
F:\test>echo foo > test
F:\test>echo bar > test:alt
F:\test>streams test

streams v1.60 - Reveal NTFS alternate streams.
Copyright (C) 2005-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

F:\test\test:
             :alt:$DATA 6

This document describes some of these aspects. Consider this paragraph:

A file name is relative to the current directory if it does not begin with one of the following:

  • A UNC name of any format, which always start with two backslash characters ("\"). For more information, see the next section.
  • A disk designator with a backslash, for example "C:\" or "d:\".
  • A single backslash, for example, "directory" or "file.txt". This is also referred to as an absolute path.

Note that C: is not an absolute path according to Microsoft. It is quite the opposite:

If a file name begins with only a disk designator but not the backslash after the colon, it is interpreted as a relative path to the current directory on the drive with the specified letter. Note that the current directory may or may not be the root directory depending on what it was set to during the most recent "change directory" operation on that disk. Examples of this format are as follows:

  • "C:tmp.txt" refers to a file named "tmp.txt" in the current directory on drive C.
  • "C:tempdirtmp.txt" refers to a file in a subdirectory to the current directory on drive C.

We always tried to keep the path module as platform-independent as possible. POSIX paths are easy to work with, they follow simple rules, but Windows paths do not.

Note that the path API has some other platform-specific problems, mostly related to path.xxx.resolve, see #1999, #13738 and #14107. I was hoping to have some time to fix some of the problems soon.


I will try to explain why some of your examples @XGHeaven and @TimothyGu are not as unambiguous as you might think:

path.win32.join('C:')

should either throw, return C: or the current working directory on drive C:.

path.win32.join('C:\\')

should return C:\.

path.win32.join('C:', 'a')

I am not sure whether C:\a is actually correct. Windows will return C:a in this case, even though most users would probably expect C:\a, which Windows will return if you pass in C:\ and a.

path.win32.parse('C:')

Again, this is not an absolute path:

C:\>cd F:\test
C:\>dir F:
 Verzeichnis von F:\test

22.07.2017  17:21    <DIR>          .
22.07.2017  17:21    <DIR>          ..
22.07.2017  17:17                16 foo
22.07.2017  17:20                 6 something
22.07.2017  17:21                 6 test
path.win32.parse('C:abcd')

I am not sure how we would represent this.


tl;dr:
It will be impossible to solve these problems without a very tight connection to the host platform, therefore making it impossible to use the same API on other platforms. Windows paths follow highly complex patterns and cannot be parsed as easily as POSIX paths.

cc @refack

path.win32.join('C:')

Whatever it return, I think C:. is absolute wrong. fix this first.

path.win32.join('C:', 'a')

I prefer to return C:a. But if you want to get C:\a, best to use

path.win32.join('C:\\', 'a')
// or
path.win32.join('C:/', 'a') // don't care about using '/' or '\\' when you write cross-platform app

And colon usage case 3 in windows, it never used in development or cmd for myself.
So I think case 3 has a low priority.

tl;dr:
It will be impossible to solve these problems without a very tight connection to the host platform, therefore making it impossible to use the same API on other platforms. Windows paths follow highly complex patterns and cannot be parsed as easily as POSIX paths.

cc @refack

Sometimes I wish GitHub would add 😭 as a reaction.

Actually, beside being ugly C:. is correct (as in c: is for the drive, . is for current directory)

d:\code\node-cur$ c:

C:\$ cd windows

C:\WINDOWS$ d:

d:\code\node-cur$ pushd c:.

C:\WINDOWS$ popd

d:\code\node-cur$ pushd c:\

c:\$

But I agree that path.format(path.parse(path.join('C:'))) === 'C:C:.' is a bug.

We always tried to keep the path module as platform-independent as possible. POSIX paths are easy to work with, they follow simple rules, but Windows paths do not.

path should not only be platform independent, it _should_ also be stateless, performing pure string manipulation with no system calls. So we have two issues here resolving C: is state dependant, and resolving . is state dependant... 🤷‍♂️

@tniessen It might be helpful to look at what other platforms are doing. Python 3.5.3 does:

>>> import ntpath
>>> ntpath.normpath('C:')
'C:'
>>> ntpath.normpath('C:.')
'C:'
>>> ntpath.normpath('C:\\')
'C:\\'
>>> ntpath.normpath('C:\\abc')
'C:\\abc'
>>> ntpath.normpath('C:abc')
'C:abc'
>>>
>>> ntpath.join('C:')
'C:'
>>> ntpath.join('C:\\')
'C:\\'
>>> ntpath.join('C:', '.')
'C:.'
>>> ntpath.join('C:', 'abc')
'C:abc'
>>> ntpath.join('C:\\', 'abc')
'C:\\abc'
>>> ntpath.join('C:\\abc', '.')
'C:\\abc\\.'

From what I'm reading in this thread, the C:. -> C: normalization should be expected and correct, and it is IMO what we should do as well. On the other hand, Python's ntpath.join doesn't normalize the path after returning, so joining 'C:' and '.' returns 'C:.'.

About parse, I guess now I would still expect C:. to parse to

    { root: 'C:', dir: '', base: '.', ext: '', name: '.' }
not { root: 'C:', dir: '', base: 'C:.', ext: '', name: 'C:.' } (current)

The root is C: alright, while the actual path is dir + path.sep + name so ..

By the same logic C:abc should be

    { root: 'C:', dir: '', base: 'abc', ext: '', name: 'abc' }
not { root: 'C:', dir: '', base: 'C:abc', ext: '', name: 'C:abc' } (current)

Here's the Python equivalent for parse:

>>> from pathlib import PureWindowsPath
>>> def PrintPath(p):
...   print({ 'drive': p.drive, 'root': p.root, 'anchor': p.anchor, 'parent': p.parent, 'name': p.name })
... 
>>> PrintPath(PureWindowsPath('C:abc'))
{'drive': 'C:', 'anchor': 'C:', 'parent': PureWindowsPath('C:'), 'name': 'abc', 'root': ''}
>>> PrintPath(PureWindowsPath('C:\\abc'))
{'drive': 'C:', 'anchor': 'C:\\', 'parent': PureWindowsPath('C:/'), 'name': 'abc', 'root': '\\'}
>>> PrintPath(PureWindowsPath('C:.'))
{'drive': 'C:', 'anchor': 'C:', 'parent': PureWindowsPath('C:'), 'name': '', 'root': ''}

PureWindowsPath normalizes the path (as seen in the last pair of lines), but otherwise I think it's pretty consistent with what I proposed above. Our root is basically Python's anchor BTW.

Actually, beside being ugly C:. is correct (as in c: is for the drive, . is for current directory)

Yes, it is correct, but I am not entirely sure we should prefer that over C: without the dot (see python ↑). Apparently, C:. comes as a surprise to many people. The main question is whether C: behaves the same as C:. when given as input to other API functions.

@TimothyGu Sounds good. Your interpretation of parse seems to be reasonable. We are still stuck with the problem of implementing resolve to deal with these paths...

Currently, the only platform-dependent functions are the resolve implementations, and we won't be able to change that. In retrospect, resolve might have been better suited for os to distinguish it from its platform-independent siblings, but we would still need to use it within path.relative etc., or change the semantics of those functions. My best approach to solving some platform-dependent problems was this variant of resolve (I was about to implement it for #13738, but did not have time yet):

  1. If the API platform is the same as the host platform, use the old implementation.
  2. If cwd is not required to compute the result, use the old implementation.
  3. If the result may be relative (e.g. to use the result in path.relative), let cwd be . and return a relative path.
  4. Otherwise, throw.

In my opinion, this is our best shot at a compromise between correctness and platform-independentness. This behavior will need to be documented precisely. For the vast majority of users, this won't change anything. I hope to get some time within the next days to propose a solution to some of these problems.

In my opinion, this is our best shot at a compromise between correctness and platform-independentness. This behavior will need to be documented precisely. For the vast majority of users, this won't change anything. I hope to get some time within the next days to propose a solution to some of these problems.

There's also my suggested https://github.com/nodejs/node/issues/13887#issuecomment-310919341 opt-in way, either:

const path = require('path');
path.strict.posix.resolve('.\\gaga') // throws
const { strict: spath } = path;
spath.posix.resolve('.\\gaga') // throws

or

const { Pure } = require('path');
const purePath1 = new Pure('C:\\fakepath')
purePath1.resolve('gaga') === 'C:\\fakepath\\gaga';
const purePath2 = new Pure();
purePath1.resolve('gaga') === Pure.resolve('gaga') // throws or returns just 'gaga'

@tniessen This issue is about path normalization (which doesn't depend on the local file system), not resolution.

@TimothyGu I know, I just thought the other problems were related enough to bring them up here. Sure, this issue can be solved independently.

@TimothyGu According to our documentation, root should be a part of dir in your last examples in https://github.com/nodejs/node/issues/14405#issuecomment-317225659.

@tniessen Cool. I'll implement this in an upcoming PR.

I just opened https://github.com/nodejs/node/pull/14440 after auditing all path functions, which fixes parse() and adds tests to make sure all of the pure functions work with volume-relative paths. I intentionally did not change path.normalize('C:') === 'C:.' however, since path.normalize('') === '.' under the current Node.js implementation and the current behavior just seems to be consistent with it.

@XGHeaven Thank you for your report, @TimothyGu fixed the behavior of the path API to correctly work with volume-relative paths.

@TimothyGu Thanks. I have a question about if this bug fixed in LTS? I remember this bug just fixed in master.

Backporting to release lines can take some time. cc @nodejs/lts

@XGHeaven This will be considered in the next v6.x release. If 2791761 cherry-picks cleanly to v6.x, and the risk of the change is deemed to be low, it should go into the next release. If it doesn't cherry-pick cleanly it'll have to be backported (process here).

v4.x is in maintenance, so most fixes aren't backported.

I suggest you subscribe to https://github.com/nodejs/node/pull/14440, any discussion about backporting to v6.x will happen on there.

Was this page helpful?
0 / 5 - 0 ratings