Powershell: Should we do path normalizations on Unix well as on Windows?

Created on 7 May 2018  路  15Comments  路  Source: PowerShell/PowerShell

Currently we do agressively path normalizations by replacing '/' with '\' on Windows and '\' with '/' on Unix. NormalizePath()
It is acceptable for Windows but has side affects on Unix because '\' is valid char in directory/file names.

.Net Core do the same for Windows but don't the normalization for Unix.
IsPathRooted on Unix (IsDirectorySeparator)
IsPathRooted on Windows (IsDirectorySeparator)

NormalizeDirectorySeparators on Unix
NormalizeDirectorySeparators on Windows

As you see Unix takes into account only '/' and Windows both '\' and '/'.

Should we follow .Net Core in the path normalization?

Related issues:

  • Set-Location preserves case instead of matching filesystem on case-insensitive/preserving system #5678
  • Location completion should be case-insensitive #1273
  • Bug: can not handle "/" correctly when reading registry item #5536
  • Replace string-compare-based test for copying to same file with more #3441
  • File mode in Linux should reflect Linux modes #1817
  • History #1115
  • On Unix, PowerShell will not find files with backslashes in their names #3666
  • New-Item -ItemType SymbolicLink should support creating symlinks with relative paths #3500.

Simple side affect repo on Unix below:

Steps to reproduce

mkdir /\

cd \
$pwd

Expected behavior

\

Actual behavior

/
Committee-Reviewed Issue-Discussion Resolution-Duplicate WG-Engine-Providers

Most helpful comment

all the major APIs support it (WinAPI, COM, .NET).

literal use of / works as a cross-platform filesystem-path separator.

I wonder - PowerShell do normalization, .Net do normalization and Win32 do too - why need this on three levels? And after that someone says that PowerShell (Windows) is slow.
Also we do repeated path parsing and rebuilding, To be more resource-efficient and fast we need to delegate common methods to providers, .Net and kernel APIs.

All 15 comments

My vote is to follow .NET Core's lead.

In general we should avoid hiding native capabilities, unless absolutely necessary.

Just to add another example:

On Unix, New-Item -Directory Path a\b, instead of creating a single dir. literally named a\b - which is what the nativemkdir a\b does - it creates _two_ directories, subdir a with a subdir b; again a consequence of automatically translating a\b into a/b.

Conversely, Set-Location won't let you change to a directory literally named a\b.

I found the same problem in registry provider. #5536

It seems we have to move the normalization of paths and possibly (partially) globbing into providers.

Worth noting that changing the behaviour so that only forward slashes work on UNIX(-like) would potentially break a lot of otherwise cross-platform scripts, and would mean that PowerShell 6 on Windows supports UNIX-style paths, but PowerShell 6 on UNIX does not support Windows-style paths (which seems sort of the wrong way round).

cross-platform scripts

They don't exist before 6.0. And cross-platform paths too. We can't break what does not yet exist. Currently we can get cross-platform paths only by means of Join-Path. A script must be written this way to be cross platform (If ignore invalid characters in paths). But this is not a complete solution because some paths is masked by the normalization.

In addition to Join-Path, we could consider an accelerator like [portablepathinfo]@($Home, "etc", "app.cfg") to get $Home/etc/app.cfg on Unix and $Home\etc\app.cfg on Windows. Both should take in account invalid characters in paths. To allow non-portable characters we could use special parameter in Join-Path and maybe [pathinfo] accelerator. With this in mind, we could do the normalization of smarter.

Note that you've always been able to use \ and / interchangeably on Windows - while there may be individual external utilities that don't support / (also, support in cmd.exe is patchy), all the major APIs support it (WinAPI, COM, .NET).

Thus - aside from ruling out illegal-in-a-filename characters - literal use of / works as a cross-platform filesystem-path separator.

Also note that we currently have few abstractions for well-known locations: it's currently just $HOME and $PSHOME; $env:PSModulePath _contains_ well-known locations, but not in an individually identifiable manner.

Even a platform-abstracted temporary-files location is currently not implemented - remember #4216?

all the major APIs support it (WinAPI, COM, .NET).

literal use of / works as a cross-platform filesystem-path separator.

I wonder - PowerShell do normalization, .Net do normalization and Win32 do too - why need this on three levels? And after that someone says that PowerShell (Windows) is slow.
Also we do repeated path parsing and rebuilding, To be more resource-efficient and fast we need to delegate common methods to providers, .Net and kernel APIs.

I wonder - PowerShell do normalization, .Net do normalization and Win32 do too - why need this on three levels? And after that someone says that PowerShell (Windows) is slow.
Also we do repeated path parsing and rebuilding, To be more resource-efficient and fast we need to delegate common methods to providers, .Net and kernel APIs.

Even in PowerShell, I think we're doing it multiple times.

The other problem, like I mentioned in https://github.com/PowerShell/PowerShell/issues/5536#issuecomment-387476149, is that we already promise to support multiple container-enabled providers that use conflicting path separators and legal name characters. So PowerShell has to do some abstract, provider-based path handling I imagine. But once we know it's a filesystem path, I agree that we should do as little as possible on top of .NET.

Also, by "breaking change" above, I mean that PSCore 6 has already shipped as GA and people are already writing scripts with backslashes in their paths to be used cross-platform.

I fervently agree that just using / would be better, especially since all the major APIs support it, it was designed into Windows from the start, and backslashes are just the bad legacy of CMD/DOS.

But I think there are scripts already being written as cross-platform that would break, and scripts from older PowerShell versions that should work cross-platform with PS6 if not for the path-separator changing.

@rjmholt

Also, by "breaking change" above, I mean that PSCore 6 has already shipped as GA and people are already writing scripts with backslashes in their paths to be used cross-platform.

That's definitely possible (and perhaps you have already found examples), but do note that the current lack of platform-abstracted locations limits the scenarios in which \-based path literals are potentially useful in cross-platform code:

  • constructing _literal_ paths _relative_ to $HOME
  • constructing _literal_ paths _relative_ to manually determined platform-specific locations, such as relative to $env:TEMP (Windows), $env:TMPDIR (macOS), /tmp (Linux).
  • constructing _literal_ paths based on an explicitly established PS drive (with a platform-appropriate root location).

I personally have no sense of how much that has happened already.

P.S.: My _guess_ is that someone savvy enough to have implemented their own platform abstractions, which requires some knowledge of the Unix world, is likely to have used / as the path separator.

Seems we don't mention that if we send paths through pipeline we again parse, normalize and so on.

Add #3441,#1817 in PR description.

@PowerShell/powershell-committee reviewed this. We agree that the utility of supporting both forward and backslashes as a directory separator is valuable for cross platform scripts as well as being existing behavior. The fundamental issue seems to be that escaped characters in paths are not propagating through providers and this is a bug that should be fixed.

Seems I don't understand the conclusion. :confused:
The fundamental issue is that currently PowerShell _encourages_ the creation of non-portable scripts and thus produces a huge number of auxiliary operations consuming a lot of resources.
Portable scripts shouldn't contain literal paths. We should use Join-Path and something like [portablepath]@().
PowerShell should works with native paths in dir | copy-item without reparsing if we want to somehow get closer to productivity of cmd/bash shells.
PowerShell should works with native paths internally and on top level (-Path/-LiteralPath parameters) - especially in interactive mode. Here we have some Issue and should address them. Why we should be escaping in Unix path on Unix? I suppose it's very annoying. Why we don't support '\' in Unix path?
I'm pretty sure that work in this direction can keep backward compatibility with Windows PowerShell.

Maybe @mklement0 could make more-in-depth review provider path issues.

Was this page helpful?
0 / 5 - 0 ratings