In Python, os.path.normpath performs a simple lexical normalization of a path, removing redundant dir separators and cur-dir ('.') parts, and resolving up-dir references ('..'). In addition, on Windows, any forward slashes are converted to backslashes.
It seems that Rust's std::fs::canonicalize is functionally similar to Python's os.path.realpath (with the addition of a check to see if the target exists), and is a superset of normpath. However, I have a case where I only want to normalize the path, and not resolve symlinks.
I'm quite new to Rust, but I'd be happy to help contribute a new method to do this to the library if I can.
Heavens, yes! I can't believe rust doesn't have something like this!
I think it ought to be a method on Path (since this is path functionality and not filesystem functionality), and should return just a plain old PathBuf--no io::Result--not just because it isn't needed, but also to better communicate its nature as a pure function.
cc @rust-lang/libs
I think I've successfully written a good first draft of this method. My aim was to replicate the version in Python (linked above) and Go. The documentation for the Go version in turn links to the Plan 9 document detailing the high-level algorithm of this sort of normalization, handy!
Right now, it's a free function, not in the std::path crate yet. As such, the method takes in an explicit &Path as opposed to a &self, but that can easily be fixed.
use std::path::Path;
use std::path::PathBuf;
use std::path::Component;
fn normalize(p: &Path) -> PathBuf {
let mut stack: Vec<Component> = vec![];
// We assume .components() removes redundant consecutive path separators.
// Note that .components() also does some normalization of '.' on its own anyways.
// This '.' normalization happens to be compatible with the approach below.
for component in p.components() {
match component {
// Drop CurDir components, do not even push onto the stack.
Component::CurDir => {},
// For ParentDir components, we need to use the contents of the stack.
Component::ParentDir => {
// Look at the top element of stack, if any.
let top = stack.last().cloned();
match top {
// A component is on the stack, need more pattern matching.
Some(c) => {
match c {
// Push the ParentDir on the stack.
Component::Prefix(_) => { stack.push(component); },
// The parent of a RootDir is itself, so drop the ParentDir (no-op).
Component::RootDir => {},
// A CurDir should never be found on the stack, since they are dropped when seen.
Component::CurDir => { unreachable!(); },
// If a ParentDir is found, it must be due to it piling up at the start of a path.
// Push the new ParentDir onto the stack.
Component::ParentDir => { stack.push(component); },
// If a Normal is found, pop it off.
Component::Normal(_) => { let _ = stack.pop(); }
}
},
// Stack is empty, so path is empty, just push.
None => { stack.push(component); }
}
},
// All others, simply push onto the stack.
_ => { stack.push(component); },
}
}
// If an empty PathBuf would be return, instead return CurDir ('.').
if stack.is_empty() {
return PathBuf::from(Component::CurDir.as_ref());
}
let mut norm_path = PathBuf::new();
for item in &stack {
norm_path.push(item.as_ref());
}
norm_path
}
Some sample (Unix) file path tests:
fn main() {
let mut paths = vec![];
paths.push(Path::new("../../home/thatsgobbles/././music/../code/.."));
paths.push(Path::new("/home//thatsgobbles/music/"));
paths.push(Path::new("/../../home/thatsgobbles/././code/../music/.."));
paths.push(Path::new(".."));
paths.push(Path::new("/.."));
paths.push(Path::new("../"));
paths.push(Path::new("/"));
paths.push(Path::new(""));
// More tests for Windows (especially with drive letters and UNC paths) needed.
for p in &paths {
let np = normalize(&p);
println!("{:?} ==> {:?}", &p, &np);
}
}
Brute force comparison:
https://gist.github.com/ExpHP/3f7d8c03be1a45ebe5abd3ad5a517d73
fn gen_strings(alphabet: &[&str], s: String, depth: u8, emit: fn(&str)) {
emit(&s);
if let Some(deeper) = depth.checked_sub(1) {
for &ch in alphabet {
gen_strings(alphabet, s.clone() + ch, deeper, emit);
}
}
}
fn main() {
gen_strings(&["a", ".", "/"], Default::default(), 12, |s| {
let p = Path::new(&s);
println!("{:>15} -> {:>15}", p.display(), normalize(&p).display());
});
}
import os
def gen_strings(alphabet, s, depth):
yield s
if depth > 0:
for ch in alphabet:
yield from gen_strings(alphabet, s + ch, depth - 1)
for s in gen_strings("a./", '', 12):
print("{:>15} -> {:>15}".format(s, os.path.normpath(s)))
Generating all strings of "a", ".", and "/" up to length 12 on Arch, there is one behavioral difference between your implementation and normpath, which is that, if a path begins with // but does not begin with ///, then normpath keeps the leading //. Interestingly, it appears that this is a thing in POSIX. (Makes me wonder if Python still does that on windows?)
It'd be nice to survey more languages and know what decisions and tradeoffs are on the table.
@ExpHP Thank you for the extended tests! I don't think I would have found that corner case without your help.
The double forward slash case is quite strange, I admit. I'm surprised that it ends up resolving to just /. I'm even newer at Go than Rust, but I can see if I can whip up a comparable test using the previously linked .Clean() method. I believe there's also such a routine in Node.js.
I'm going to look into installing Rust on my Windows machine tonight, and trying out some tests on there.
Closing in favor of https://github.com/rust-lang/rust/issues/47402.
@Centril is this correct to close this issue, given that the https://github.com/rust-lang/rust/issues/47402 is (just?) a "tracking issue", which actually points back to this very issue? I'm really sorry if this is something normal, but as a Rust newbie, I really don't understand what's the status of this in such a situation :( Is it open? closed? worked on? not worked on? what could I do if I wanted to, say, submit a solution proposal?
I recall that this was associated with a PR that was never finished after a long period of back-and-forth. Looking at the linkbacks, this is it:
https://github.com/rust-lang/rust/pull/47363
Since we haven't heard from the author for a long time, I'd reckon it's fair game for anybody who wants to rebase that PR and address the outstanding feedback.
Issues on the RFC repository generally aren't regarded as actionable items (people for some reason have been using them as feature requests or like a discussion forum), so their open/close state is largely immaterial beyond the potential to confuse people. I'd always look at the linkbacks.
Original author here, apologies for the radio silence. It's been a long time since I've had time to work on this, not to mention my inexperience with contributing to open source in general.
Aside from that, the one issue I encountered with this was that Windows paths weren't all getting normalized as expected, as per a recommended Windows path test suite (seen in the discussion on the PR rust-lang/rust#47363).
My hope was that using Path.components() would be sufficient to do the normalization, but it seems that .components() isn't working as expected for my case.
I'm going to reopen this instead of https://github.com/rust-lang/rust/issues/47402 as the associated PR never landed so unfortunately we don't actually have anything to track at this point (beyond the desire for the feature itself, which is betters suited to the rfcs repo).
Is this still looking for someone to implement it? I'm happy to give it a go, as long as someone familiar with Windows paths can give me some edge cases to test against.
Also, given that this won't access the filesystem (that's canonicalize's job), how do we want to handle paths like foo/../../bar which go "out of bounds"? Error? Ignore extra .. (as happens when you reach / in *nix)? Something else?
On Windows foo/../../bar is equivalent to ..\bar so that is how I would handle it there.
I wrote a port of some functions I needed from the Go's path/filepath module and some discussions even lead to a pre-RFC:
path_test.go)
Most helpful comment
Brute force comparison:
https://gist.github.com/ExpHP/3f7d8c03be1a45ebe5abd3ad5a517d73
Generating all strings of "a", ".", and "/" up to length 12 on Arch, there is one behavioral difference between your implementation and normpath, which is that, if a path begins with
//but does not begin with///, thennormpathkeeps the leading//. Interestingly, it appears that this is a thing in POSIX. (Makes me wonder if Python still does that on windows?)It'd be nice to survey more languages and know what decisions and tradeoffs are on the table.