Currently there is no direct way to normalize the path or to remove the relative Segments from any path string. We can use GetFullPath to normalize the path but it will resolve it relative to root directory or working directory which is not always desired.
eg
working with relative paths in archives where we may require normalized paths that are "full" paths in said archive, but not on the system
Will help in building and working with relative urls.
``` C#
namespace System.IO
{
public class Path
{
public static string RemoveRelativeSegments(string path);
public static string RemoveRelativeSegments(ReadOnlySpan
public static bool TryRemoveRelativeSegments(ReadOnlySpan
}
}
The ```rootlength``` is the length of the path which will never be trimmed while removing the relativeSegments.
# Behavior
- `Path.DirectorySeparatorChar` and `Path.AltDirectorySeparatorChar` are considered path "segment" separators (`\` and `/` on Windows, `/` on Unix)
- Sequential separators will be collapsed to a single separator
- Any `Path.AltDirectorySeparatorChar` characters will be changed to `Path.DirectorySeparatorChar` (only relevant on Windows).
- Segments of one period only `.` will be removed.
- Segments of two periods only `..` will be removed along with the parent segment, if any.
# Implementation
This api is already implemented as an internal api and is being used by ```GetFullPath``` api to remove relative segments in case of device paths.
https://github.com/dotnet/runtime/blob/96d9a47228b0412945d151a79af7bf66463b053a/src/libraries/System.Private.CoreLib/src/System/IO/PathInternal.cs#L129
# Usage
```cs
// removing the relative segment
Assert.Equal("C:\temp", PathInternal.RemoveRelativeSegments("C:\git\..\temp"));
// removing the relative segment but not eating the root
Assert.Equal("C:\temp", PathInternal.RemoveRelativeSegments("C:\..\..\temp"));
// Input is relative path
Assert.Equal("temp", PathInternal.RemoveRelativeSegments("git\..\temp"));
// Multiple types of relative Segments together
Assert.Equal("temp", PathInternal.RemoveRelativeSegments("git\.\..\temp"));
Assert.Equal("git\temp", PathInternal.RemoveRelativeSegments("git\\\\temp"));
// Normalizing the path
Assert.Equal("git\temp\src\corefx", PathInternal.RemoveRelativeSegments("git\temp/src\corefx"));
More Usages can be find here
https://github.com/dotnet/corefx/pull/37225
I have modified the internal tests to api proposal
Resolving this https://github.com/dotnet/corefx/issues/4208
Can directly call RemoveRelativeSegments(RelativePath);
cc @danmosemsft @JeremyKuhne
related implementation prs https://github.com/dotnet/coreclr/pull/24273 https://github.com/dotnet/corefx/pull/37225
Do 1, 2, 3 above also apply to forward slash exactly the same? On all platforms?
Do 1, 2, 3 above also apply to forward slash exactly the same? On all platforms?
yes, the same will apply to forward slash and on all platforms
This does not make sense to me
// Not removing the relative segments if it occurs in the skip length
Assert.Equal("git\..\temp", PathInternal.RemoveRelativeSegments("git\..\temp\.\", 7));
What is the skip length? When does it remove \..\ ? This test does not exist in PathInternal.Tests.cs
For yoru clarifications, please update the top post.
Otherwise it seems reasonable to me, Jeremy can figure whether to flip the label to ready for review
What is the skip length? When does it remove ..\ ?
I updated the skipLength to rootLength in the post . The Api starts removing the relative segments from the path string after the rootlength.
Can you add a Span version of this please? TryRemoveRelativeSegments() will never return a longer output than input- as such it is pretty easy to pass a sufficient output buffer.
For ease of use I'd also add string RemoveRelativeSegments(ReadOnlySpan<char>). It should be a pretty common scenario to want the "normalized" result on the heap. You can take spans of an unnormalized path with our other APIs (GetDirectoryName for example). It would be nice to be able to call a chain of these with only one output string allocation.
You need to be very explicit about what normalizing the separator means. Path.AltDirectorySeparatorChar -> Path.DirectorySeparatorChar. MSBuild turns \ into / on Unix. My gut feel is that we don't need to provide that option to start, but we should call it out as a potential future feature should we get enough demand.
Please call out error handling for this as well. I presume given what we've done for Path.Join(), that we'd want empty/null to return empty/null. We should explicitly discuss in API review.
@JeremyKuhne I'd like to add a vote for normalizing the separator. This is something I'm using string.Replace for today.
The
rootlengthis the length of the path which will never be trimmed while removing the relativeSegments.
Do I understand it correctly that it's length in characters, not in path segments? Why?
In any case, I think this should be clarified.
Do I understand it correctly that it's length in characters, not in path segments? Why?
Because we can't always understand the relevance of separators. Notably with UNCs for paths and device paths on Windows (\\server\share, \\?\UNC), but one can imagine others. We could try and GetPathRoot as the first segment but that assumes the path is well formatted. Users can make the two-step call and we'll give that example if they have "normal" path. This is, in fact, what we do with GetFullPath(path, basePath) currently.
@JeremyKuhne anything else that is required here ?
@anipik I think it's fine to mark ready for review whenever you're ready.
Just a quick note on using this for URLs: what would be the result of RemoveRelativeSegments("/foo/?IAmActuallyAQueryStringComponent/../../")?
@Anipik you mentioned you don't want to use GetFullPath() because you want to preserve the relativeness. What's the scenario for this API then? Functionally, it seems GetFullPath() or keeping the path with the dots seems both OK.
@Anipik you mentioned you don't want to use GetFullPath() because you want to preserve the relativeness. What's the scenario for this API then? Functionally, it seems GetFullPath() or keeping the path with the dots seems both OK.
What you want to do is normalize a path segment without it being combined with the current working directory. If you call Path.GetFullPath() on foo\..\bar you're going to get a path with the current working directory added to the front. This api will give you back foo\bar. This is particularly useful for working with archives as described in dotnet/corefx#4208.
Similar to dotnet/corefx#24685, this API will be defective on Linux in the presence of symbolic links:
C:\Users\All Users is a symbolic link to C:\ProgramData then the canonical form of C:\Users\All Users\..\foo is C:\Users\foo./var/lock is a symbolic link to /run/lock then the canonical form of /var/lock/../foo is /run/foo.Fire up WSL and give it a try. :)
user@roxy:~$ head -n 1 /var/lock/../resolvconf/resolv.conf /run/resolvconf/resolv.conf /var/resolvconf/resolv.conf
==> /var/lock/../resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.
==> /run/resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.
head: cannot open '/var/resolvconf/resolv.conf' for reading: No such file or directory
user@roxy:~$
this API will be defective on Linux in the presence of symbolic links
This API will deliberately not follow links or touch the file system in any way.
I'm updating the original description to me more precise about what the heuristics are.
I guess since GetFullPath has the same behavior, it's not really out of line. And there's probably an argument to be made that nobody wants or expects Linux's actual behavior. Even bash's tab completion doesn't match Linux's actual behavior.
Alright, here is the shape we'd like to see:
C#
namespace System.IO
{
public class Path
{
public static string RemoveRedundantSegments(string path);
public static string RemoveRedundantSegments(ReadOnlySpan<char> path);
public static bool TryRemoveRedundantSegments(ReadOnlySpan<char> path, Span<char> destination, out int charsWritten);
}
}
I'll work on this.
public static string RemoveRedundantSegments(ReadOnlySpan
path);
Should this return Span, like existing Path.TrimEndingDirectorySeparator? Returning string means that this API will allocate unnecessarily when the path does not contain any redundant segments.
@jkotas I see what you mean. I am working on this right now and as I was implementing that method, I started wondering about that. You were faster than me in asking about it. 馃樃
I did a quick replay of the video, but I couldn't find any mention of the return value, so it could've been a small oversight (I could've also missed it).
Question to the reviewers in the video: Should we update the second signature to return a span instead of a string? What's the process to alter an already approved API?
@terrajobst @bartonjs @GrabYourPitchforks @tannergooding @KrzysztofCwalina @scalablecory @JeremyKuhne
Should we update the second signature to return a span instead of a string?
No, the guidance is to only return a span that is a slice (including the trivially-all slice or trivially-none slice) of an input parameter.
Someone wanting the non-allocating one will just have to use the Try method. (And/or make is one return null if the input is unchanged, but that might make it harder to use. The one that accepts a string and returns a string can return the input argument on no redundant segment)
I'm late to the game here, but I'm struggling with the word "redundant" in the API name, as I don't think we're matching the definition of "redundant". The path /foo/bar/../baz does not have any _redundant_ segments as nothing is repeated or excessive--instead .. _negates_ bar; it cancels it out.
Node has a Path.normalize function that is strikingly similar to what is being proposed here. I always liked the "normalize" verb for this function. To make this specific to relative path segments, I recommend NormalizeRelativeSegments.
Most helpful comment
Video
Alright, here is the shape we'd like to see:
C# namespace System.IO { public class Path { public static string RemoveRedundantSegments(string path); public static string RemoveRedundantSegments(ReadOnlySpan<char> path); public static bool TryRemoveRedundantSegments(ReadOnlySpan<char> path, Span<char> destination, out int charsWritten); } }