go version)?$ go version go version go1.12.7 windows/amd64
Yes.
go env)?Applies to all OSes
The same results as calling GetFinalPathNameByHandle: a UNC path.
A path using the drive letter instead of the UNC path.
This affects any attempt to canonicalize paths using the output of Git in such a situation. Git produces some paths as absolute and some paths as relative, and uses GetFinalPathNameByHandle for canonicalizing absolute paths. However, Go lacks a function to canonicalize paths in a standard way, so it isn't possible to produce results equivalent to a C program and still write code that works portably across systems.
Go should add a function that is explicitly defined to canonicalize paths in a way equivalent to the underlying operating system, since using filepath.Abs and filepath.EvalSymlinks doesn't work correctly on Windows. It does work fine on Unix, but Unix paths are much simpler and easier to reason about.
It was determined in #17084 that filepath.Abs and filepath.EvalSymlinks were sufficient in this case, but that doesn't appear to be true. I expect there are other cases in which those don't work on Windows, but I am insufficiently versed in Windows paths to know what those are.
This was originally reported to the Git LFS project in git-lfs/git-lfs#4012.
/cc @robpike @rsc per owners.
What did you expect to see?
The same results as calling GetFinalPathNameByHandle: a UNC path.
I did not try it, but, I suspect, UNC paths wouldn't work in some situations. For example, can you pass UNC path to os.Chdir?
Alex
I don't know for certain, but judging by a quick Google search, it appears to be possible in Ruby, so I assume one can do that in C-based languages.
I'm not a Windows developer, so I'm not a good person to ask about the capabilities of Windows. I'm just a Unix developer trying to make general-purpose software not be terrible on Windows.
I don't know for certain, but judging by a quick Google search, it appears to be possible in Ruby, so I assume one can do that in C-based languages.
You are correct. I was wrong. os.Chdir does work with UNC paths.
Alex
It is also the case that filepath.EvalSymlinks fails to work when canonicalizing paths where there's a junction to a volume that lacks a drive letter (a OneDrive mount is a good example of this). For example, if C:\Users\User\OneDrive\Vault is a junction pointing to a OneDrive mount and we try to call filepath.EvalSymlinks("C:/Users/Users/OneDrive/Vault/home.git"), that will fail with readlink C:\Users\User\OneDrive\Vault: The system cannot find the path specified.
This also works with C-based programs.
Hey,
Is there any interest in fixing this? Right now, there is no cross-platform way to canonicalize a path in Go. We keep running up against additional cases where the existing behavior doesn't canonicalize paths properly, leading to incompatibility with other programs on the system (notably Git). This necessarily limits the portability of using Go as a cross-platform language.
By "function to canonicalize paths" do you mean a variation of EvalSymlinks that works on Windows? If so, note that EvalSymlinks is not recommended: #40180 (and probably can't be fixed).
Go on Windows has a variety of long-standing filesystem bugs. I suggest using x/sys/windows to call the WinAPI if that solves your problem.
I mean a function, when given a path, that returns a canonicalized version of that path. In other words, the equivalent to realpath(3) on Unix or GetFinalPathNameByHandle on Windows, and the equivalent to Rust's std::path::canonicalize.
It isn't helpful to me to call the Windows API because (a) I'm not a Windows programmer and have no clue how to use it, (b) it isn't cross-platform, and (c) this is a function that is generally provided by the standard library.
Go has gaps on Windows; I plug them in my code. You've seen the interest this issue evoked :-p
What you need isn't hard. Create a file named yourpkg_windows.go, import "golang.org/x/sys/windows", define GetCanonicalPath() to call CreateFile("yourfile") (to get a handle) then GetFinalPathNameByHandle.
Create a file yourpkg_unix.go with a // +build directive for your unix platforms. Define GetCanonicalPath() with the solution for unix you already know.
First of all, I appreciate that you're trying to help. However, I do feel firmly that this functionality should be in the standard library, since it is in almost every other language, and it is in POSIX. I don't want to carry a lot of platform-specific code in a program because it's difficult to maintain and test, especially when I don't typically develop on Windows.
If Go is known to have known defects on Windows, those should be promptly fixed or clearly documented. For many purposes, it's fine if code doesn't run or run well on Windows, but there are some cases where it does. The documentation should clearly and prominently list any limitations with using Go on Windows so that folks can make an informed decision. Last I checked, the filepath documentation didn't indicate such limitations, and hasn't for some time.
Normally, when I find a bug or missing feature, I would send a patch to implement that functionality. However, Go has a CLA, and I don't sign CLAs, so any patch I might submit wouldn't be accepted. If that changes, I'm happy to send a patch to implement this properly if nobody gets to it before me.
On Unix systems I think the proposed function is the same as filepath.EvalSymlinks.
Yes, I believe that they are identical. filepath.EvalSymlinks is, as far as I'm aware, equivalent to realpath(3) on Unix and has the semantics I'm looking for.
This proposal should probably also deprecate EvalSymlinks, which is seriously broken on Windows, see https://github.com/golang/go/issues/40180#issuecomment-661350111
What does "canonical" mean, precisely?
If there are multiple ways to refer to a filename, the canonical path is the absolute filename which uses no indirections and uses the canonical case (that is, the path component as written to the file system) if the system permits case folding. On Unix, that's the one that contains no symlinks (and, on macOS, uses canonical case and composition). On Windows, there are many ways to have indirection in a path: symlinks, junctions, SUBST, etc. (I don't actually know all of the possible ways, since I almost never use Windows). The canonical form uses none of those indirections and uses the canonical case.
Another way to say this is that assuming no hardlinks exist, a file on Unix should have exactly one canonical name whose components are either directories or non-symlink, non-directory (but possibly special) files.
@networkimprov, you make assertions without being specific about them. I am confused about three of the things you've said related to this issue.
You linked to #40180 which is making a general software engineering argument along the lines of "you should never actually replace all the symlinks, that's violating the abstractions that have been set up". I have some sympathy for that, but if it were true, it would apply not just to EvalSymlinks but also this issue as well. If so, then we should just close this very issue (#37113) as a terrible idea.
I see that you mentioned this issue in #40966, which is about some problems with path lengths in EvalSymlinks on Windows. We've had path length problems elsewhere on Windows. Path length issues are usually pretty straightforward to fix. Why would we want to gate a fix to #40966 on a larger design discussion on this issue?
Finally, you said, with no links at all, "This proposal should probably also deprecate EvalSymlinks, which is seriously broken on Windows." How is it broken? That comment would be a good place for an issue link.
Thanks.
I mentioned #40180 in https://github.com/golang/go/issues/37113#issuecomment-689088220 to suggest that the issue author reconsider canonicalization of paths. I didn't link it again later, but it documents a long list of problems with EvalSymlinks on Windows (which I've now linked).
Re path length bugs, other instances of those have been left alone, see #21782 & #36375. And here's a list of Windows bugs that mention "filepath" https://github.com/golang/go/issues?q=is%3Aopen+is%3Aissue+label%3AOS-Windows+filepath
Canonicalization of paths is required to properly implement any sort of Git support in a project. More generally, it's required to determine definitively if a path is under a directory, which has a wide variety of general-purpose applications outside of Git. Whether other people think it is useful in their projects, path canonicalization is commonly used and is almost always provided by the standard library. Canonicalizing paths is also recommended by CMU's secure coding guidelines; while those are for C, there's no reason to think Go is any different.
I agree that users typically don't want to see canonicalized paths and that path canonicalization cannot be used where there's a security-sensitive race condition, but that doesn't mean it lacks applications elsewhere, just that it's unsuitable for some use cases.
@networkimprov, the path length bugs are left alone only for lack of time. I don't think there's any objection to fixing them as long as it is done correctly and well.
@bk2204, I'm certainly not arguing against this functionality. I'm trying to understand why EvalSymlinks shouldn't be what provides this functionality on Windows.
(I do somewhat object to the name "canonical": if "/home/rsc" symlinks to some device path like "/u123/g3tah0uojq1/rsc", I have a hard time calling the latter the "canonical" one. And for what it's worth there are plenty of people who disagree with you about what "determine definitively if a path is under a directory" should mean. I have the bug reports to prove it. :-) But again, I'm not saying we shouldn't do this. I just think EvalSymlinks is probably the answer.)
If you change EvalSymlinks to call GetFinalPathNameByHandle on Windows, it may break some apps, so I'm pretty sure Alex wouldn't agree.
If you change EvalSymlinks to call GetFinalPathNameByHandle on Windows, it may break some apps, so I'm pretty sure Alex wouldn't agree.
It would need to keep doing what it's documented to do, namely preserve relative-ness to current directory when possible. That means calling GetFinalPathNameByHandle and fixing up the result a little. But we could still build a function around GetFinalPathNameByHandle that should handle everything Windows can throw at it.
Did you have a specific breakage in mind?
Are you proposing that the behavior differ from GetFinalPathNameByHandle (on Windows) only when the path name is relative, or when the path name is absolute as well? The former is fine, I think, and I have no position on it; the latter would be a problem for interoperability with tools written in other languages.
@alexbrainman what do you think of changing the implementation of filepath.EvalSymlinks to just call GetFinalPathNameByHandle?
cc @ericwj
Relative paths are not thread-safe. Full stop.
EvalSymlinks should not be fixed but replaced. GetFinalPathNameByHandle always returns an absolute path. Paths with \\?\ syntax are always absolute.
GetFinalPathNameByHandleA function (fileapi.h) - Win32 apps | Microsoft Docs
The string that is returned by this function uses the "\\?\" syntax. For more information, see
CreateFile.
The CMU guidance obviously is written without considering the arguments against EvalSymlinks that I wrote up - which is based on guidance from Microsoft, so I don't quite agree with your conclusion that resolving file system driver paths is a proper thing to do while canonicalizing. I have just hastily scanned that document from CMU and I don't immediately see they say you should resolve links. They just mention links could be present for awareness.
I have written quite a bit of software and I have never considered writing checks like that. Usually path checks involve checking whether they are in some base path - for which Rel is better than EvalSymlinks. Sure you can ask for a properly cased path and/or fix slashes, but otherwise, why write string handling or comparisons at all? Paths usually come from somewhere. I mean from the working directory, from configuration or perhaps from user input in most cases. Usually those are already trusted without doing any checks. Or they are built from system settings, which are also trusted. That might be circumstantial evidence against your conclusion based on that article but still.
Also about the Git example still I think the application should be in control of which links get resolved. Some of the links that make up a path used by Git could still be system administrator controlled which makes it wrong to resolve them and the result dependant on system configuration. Like for my system - most Git paths ran through EvalSymlinks on my system will (have to) be volume GUID paths (and hence again all always absolute). But Git should never care about that or go that deep. It's wrong.
EDIT: The article ignores case-sensitivity issues by using strncmp. That is wrong as well. Most software on Windows assumes case-insensitivity, although NTFS is case-sensitive. On Linux, FAT32 is case-insensitive.
As it's not possible to relativize some results of GetFinalPathNameByHandle, that could be a cause of breakage if EvalSymlinks is reimplemented with it.
Relative paths are not thread-safe. Full stop.
That's not true on modern POSIX systems. You can open a file descriptor to a directory and operate on a path relative to that file descriptor with the *at series of functions. It may be true on Windows.
The CMU guidance obviously is written without considering the arguments against
EvalSymlinksthat I wrote up - which is based on guidance from Microsoft, so I don't quite agree with your conclusion that resolving file system driver paths is a proper thing to do while canonicalizing. I have just hastily scanned that document from CMU and I don't immediately see they say you should resolve links. They just mention links could be present for awareness.
I agree with some of your points and I've stated so above. Users usually are not interested in canonical paths. It's also wrong to use path canonicalization in a case where the possibility of changing path resolution leads to a security problem or buggy behavior. That's a well known problem on Unix systems and Windows is no different.
It is irrelevant to me whether the path is created with a drive letter or not, so I have no position on that argument.
I agree that EvalSymlinks, as it exists today, is broken on Windows and does not do anything interesting or useful in most cases. That's why I opened this issue.
I have written quite a bit of software and I have never considered writing checks like that. Usually path checks involve checking whether they are in some base path - for which
Relis better thanEvalSymlinks. Sure you can ask for a properly cased path and/or fix slashes, but otherwise, why write string handling or comparisons at all? Paths usually come from somewhere. I mean from the working directory, from configuration or perhaps from user input in most cases. Usually those are already trusted without doing any checks. Or they are built from system settings, which are also trusted. That might be circumstantial evidence against your conclusion based on that article but still.
The question is not whether you think this feature is valuable. You need not use it. The question is whether Go ought to provide access in a portable way to cross-platform functionality that every operating system and every other major language provides and which is used in many projects for good and valuable reasons, and which, even if used imprudently, is required for compatibility with other software already existing for longer than Go has. The fact that major organizations like CMU recommend this practice is evidence that this feature is important and valuable, even if you disagree.
Path canonicalization is even more important on Windows than Unix because Windows has case-folding behavior in its file system that depends on attributes related to when the file system was created. It is therefore impossible without the kernel's help to know the proper name of a file and whether two file names actually refer to the same item on disk, and Windows otherwise lacks the concept of device and inode numbers which are normally used to perform this check on Unix.
Also about the Git example still I think the application should be in control of which links get resolved. Some of the links that make up a path used by Git could still be system administrator controlled which makes it wrong to resolve them and the result dependant on system configuration. Like for my system - most Git paths ran through
EvalSymlinkson my system will (have to) be volume GUID paths (and hence again all always absolute). But Git should never care about that or go that deep. It's wrong.
I have seen first hand how Git broke when it did not canonicalize paths consistently, and I still have a broken repository on my system from that point. I am a core contributor to Git and the primary driver of the SHA-256 transition. I've developed in situations where symlinks and path resolution have important and subtle security implications. I understand the problem space intimately and why Git has the behavior it does. Please don't try to to tell me that Git's behavior is wrong here, because it is not. Whether a path is created by the user, the system administrator, or any other actor does not change whether canonicalization is necessary.
That's not true on modern POSIX systems. You can open a file descriptor to a directory and operate on a path relative to that file descriptor with the *at series of functions. It may be true on Windows.
Obviously that is the same on any operating system if you just use Rel and Join. But EvalSymlinks makes the path relative to the current working directory and it is that fact that makes it not thread-safe. You can use Join but again you need the current working directory as argument, which might have changed in between these two calls.
The question is not whether you think this feature is valuable. You need not use it.
My argument is not against canonicalization, but against using or fixing EvalSymlinks to do it. This whole comment was about using GetFinalPathNameByHandle to implement EvalSymlinks. It can't be done properly.
Windows otherwise lacks the concept of device and inode numbers
There is the concept of object identifiers, but these are an NTFS concept. So yeah sure absolutely, the OS needs to be involved and one of the things that is wrong with path\filepath is that it doesn't involve the OS enough.
Please don't try to to tell me that Git's behavior is wrong here, because it is not.
I don't know what Git does, but I really think it is wrong to resolve all links - even those that are in a parent directory of the Git repo. The issue linked specifically mentions the ability to have files open while these links are being changed and to continue to open files afterwards as long as these links are not in any way, for any length of time, cached. I don't mean to criticize your intimate knowledge and experience with Git development, but the proper way to go is to know which links are part of repository configuration and which ones are system configuration and leave the latter ones forever unresolved.
The two objections to using GetFinalPathNameByHandle seem to be (1) needing to return absolute paths sometimes, and (2) problems with relative paths and threads changing directories.
For (1), EvalSymlinks is _already_ defined to return an absolute path when necessary. If the result of GetFinalPathNameByHandle cannot be made relative to the current directory, then the absolute one can be returned. That's entirely within the documented behavior.
For (2), there's nothing wrong with relative paths per se provided the process is not calling os.Chdir. It is Chdir (the write operation) that is not "thread-safe", not the relative path evaluation (the read operations). If you have a program that uses Chdir, then yes, use absolute paths. Pass an absolute path to EvalSymlinks and you'll get one out. But if EvalSymlinks is passed a relative path, there is no added harm in returning one.
It's really looking to me like we should use GetFinalPathNameByHandle in EvalSymlinks. Are there other reasons we should not?
It may be in-spec to always return an absolute path. But existing apps may rely on the current behavior. It's just conjecture, but the same has torpedoed previous proposals. However I'm not personally opposed to it.
cc @mattn
It may be in-spec to always return an absolute path.
That's not what I'm suggesting. I wrote "If the result of GetFinalPathNameByHandle cannot be made relative to the current directory, then the absolute one can be returned."
I did not write "The absolute one can be returned always."
By "made relative" I meant transformed to be relative to the current directory by filepath.Rel.
Ah, I misinterpreted that. The one issue for a relative transformation is that relative paths have a max length, so some cannot be transformed, tho an app might expect it.
@ericwj did we investigate whether filepath.Rel is healthy on Windows?
Ah, I misinterpreted that. The one issue for a relative transformation is that relative paths have a max length, so some cannot be transformed, tho an app might expect it.
That's news to me. A relative path stored as a string inside a Go program should _not_ have a max length.
If you pass a very long path to os.Open, I thought we did the appropriate rewrite to remove the length restriction. Do we not do that for relative paths?
If the Go APIs aren't handling long relative paths correctly, we should fix _that_, not introduce strange inconsistencies elsewhere.
EDIT: I'm not sure which (if any) of the APIs that take a path can handle a length over the Windows limit for relative paths, but these can't:
Whether it's OK to return a relative path that's not suitable for external use (e.g. a log) should also be considered.
Whether it's OK to return a relative path that's not suitable for external use (e.g. a log) should also be considered.
I am not worried about that problem. "External use" could mean adding a new path element to a relative directory name, and that might turn a short-enough path into a too-long path. So in the limit the argument would be that we should never use a relative path because of the limits. That's clearly wrong.
I filed #41734 for fixing fixLongPath to handle relative paths so that we can stop having discussions about which APIs do and don't accept certain kinds of long paths.
@bk2204 did you change any permissions on this thread? If so, could you flip them back? I asked @ericwj a Q re filepath.Rel, and he's unable to respond here now.
@alexbrainman what do you think of changing the implementation of
filepath.EvalSymlinksto just callGetFinalPathNameByHandle?
We should not change what filepath.EvalSymlinks implementation.
Alex
@alexbrainman I'm sure there are cases where GetFinalPathNameByHandle would return a different value, but are there cases where it would return a worse value?
are there cases where it would return a worse value?
@ianlancetaylor I am not aware of any cases with worse values. But I did not spend any time on it, and did not consider any scenarios.
We did not implement filepath.EvalSymlinks in one day - it evolved over time. It was completely broken at the start, and then people complained about their scenarios, and we fixed their scenarios. And so on. But we are not the experts in this area, we made decisions to the best of our abilities at the time. Even Microsoft changed this area in the last 10 years - they added some new features that are used by Docker. And we started with Windows XP and now we have Windows 10 - very different OS in that regard.
Perhaps just using GetFinalPathNameByHandle would have worked as well. I do not know. But I would not change filepath.EvalSymlinks implementation now. The change will definitely brake some programs.
Alex
@alexbrainman you are arguing "this is the way it's always been" rather than arguing "this is why it is the way it is". That can be a quicker reply, which is understandable. But for what it's worth, explaining why things are the way they are and what exactly would be worse about a change is more useful, because it lets us understand the possible paths forward.
I think Alex did give the "why" -- the Windows implementation initially tried to mirror the unix one, and evolved from there, as no one questioned that decision, or the lack of a GetCanonicalPath() API.
The safest thing to do now is add a new API, and deprecate EvalSymlinks.
But for what it's worth, explaining why things are the way they are and what exactly would be worse about a change is more useful, because it lets us understand the possible paths forward.
I do not know / remember why things the way they are. I do not know what exactly would be worse. I wish I could be more useful. And I am sorry that I am not useful.
Like I said before, it has been a long process and many changes. I don't remember every decision we made along the way.
I also don't use these features myself. So it is hard for me to judge whether GetFinalPathNameByHandle would be worse that what filepath.EvalSymlinks does at this moment. I guess someone actually need to spend some time and investigate every single problem that is reported with filepath.EvalSymlinks and see, if using GetFinalPathNameByHandle is a solution to this problem. Unfortunately don't have time to spend on this.
Alex
OK, it sounds like maybe it would be best to leave EvalSymlinks alone. That leaves the git-lfs/git-lfs#4012 issue, which is that nothing can turn N:\foo into \network\host\foo. If we add something to do that while also resolving symlinks, perhaps filepath.Resolve? Then we get to define a new function, and the result is always as absolute as possible (begins with / on Unix, UNC on Windows avoiding drive letter whenever possible, evaluating out symlinks, and so on).
filepath.Resolve would be the same as Unix realpath(3), and the same as Windows GetFinalPathNameByHandle.
On Unix it would be the same as filepath.EvalSymlinks(filepath.Abs(path)).
On Windows, we have no idea what EvalSymlinks does, so it's hard to state in terms of existing API.
What do people think of adding filepath.Resolve with this definition?
Sounds good. Maybe should call it Real() or RealPath() since that term is already widely known.
That sounds like a great solution. I'm happy with any of the three proposed names.
That leaves the git-lfs/git-lfs#4012 issue, which is that nothing can turn N:\foo into \network\host\foo
If the git-lfs/git-lfs#4012 issue fix is to call GetFinalPathNameByHandle Windows API, why cannot they just call the API?
What do people think of adding filepath.Resolve with this definition?
What would filepath.Resolve documentation say? How will filepath.Resolve differ from filepath.EvalSymlinks? For example, how would you explain to a new Go user on both Linux and Windows when she should use one function instead of the other?
Thank you.
Alex
The docs should say...
On Windows, EvalSymlinks is broken for many cases. Real/RealPath/Resolve is the only API to "evaluate" symlinks. See https://github.com/golang/go/issues/40180#issuecomment-661350111
On unix, the RealPath API is equivalent to filepath.EvalSymlinks(filepath.Abs(p))
EvalSymlinks should probably be deprecated.
If the git-lfs/git-lfs#4012 issue fix is to call GetFinalPathNameByHandle Windows API, why cannot they just call the API?
We don't want to carry a large amount of non-portable system-specific code. We are not the Git project, with many regular full-time contributors across a wide variety of platforms. Our team is mostly two people, neither of whom use Windows on a regular basis, and we rely on Go to make our Windows support more than best effort. We already have platform-specific code to support NTLM specially on Windows and it's going away because it's broken and nobody wants to volunteer to fix it.
I also happen to feel that requiring every user who wants to support cross-platform path canonicalization to know about and use GetFinalPathNameByHandle explicitly and use it is not going to help Windows get better support, especially when filepath.EvalSymlinks is right there and seems to do the right thing in most cases (except it doesn't). I only happen to know about it because I've actually looked up which function to use for this purpose at some point in the past; in general, a Unix developer won't know about it and won't know to use it.
What do people think of adding filepath.Resolve with this definition?
What would filepath.Resolve documentation say? How will filepath.Resolve differ from filepath.EvalSymlinks? For example, how would you explain to a new Go user on both Linux and Windows when she should use one function instead of the other?
filepath.Resolve is used in the case when you want to canonicalize a path, and it will always result in an absolute path. filepath.EvalSymlinks only resolves symlinks, may produce a relative path if its input is relative, and does not handle non-symlinks on Windows. It is not suitable for canonicalizing a path on Windows and cannot be used successfully for that purpose. filepath.Resolve should be used unless you are sure you want the behavior of filepath.EvalSymlinks.
We don't want to carry a large amount of non-portable system-specific code.
It is not a lot of code. You can, probably, just copy all the code you need from $GOROOT directory - just grep for GetFinalPathNameByHandle.
Alex
FWIW, RealPath is not great because it says Path in it.
Real is not great because it is edit-distance-1 from another function that does something similar (Rel).
I also am not sure what makes a path "real" in the first place.
Resolve is clearer about the action being done.
Based on the discussion above, adding Resolve with the semantics described seems like a likely accept.
filepath.Resolve is used in the case when you want to canonicalize a path, and it will always result in an absolute path. filepath.EvalSymlinks only resolves symlinks, may produce a relative path if its input is relative, and does not handle non-symlinks on Windows. It is not suitable for canonicalizing a path on Windows and cannot be used successfully for that purpose. filepath.Resolve should be used unless you are sure you want the behavior of filepath.EvalSymlinks.
It looks like filepath.EvalSymlinks is broken on Windows, and we are now planning to replace it with a similar function with a different name and slightly different semantics, because it's too hard to touch EvalSymlinks on Windows, where it's already broken and doesn't always work. Nobody can enumerate where it does work or it doesn't, but it seems easier not to touch it anymore rather than fixing it.
This sounds extremely rushed to me. We're adding an API that is basically a duplicate of another existing one because we don't have time to properly fix the existing one, analyzing all issues. Also it looks like there are no volunteers to work on it.
If we can get this proposal on hold for a month or so, I can attempt a thorough analysis on the history of changes of EvalSymlinks on Windows, and attempt a CL as described by Russ in https://github.com/golang/go/issues/37113#issuecomment-701509206. This would require some time and some effort to get right, but I think it would be a better long term solution rather than adding a new API. If I fail, we can still fallback to adding a new API, but at least I hope we would get a better understanding around EvalSymlinks and whether it's really inherently broken forever on Windows.
Giovanni, please give the folks above some credit for our consideration of this issue. @ericwj did a careful analysis of EvalSymlinks on Windows in #40180 (read whole thread) -- which I linked above. [BTW, he was improperly locked out of this thread by an anonymous admin for no stated reason.]
It's clear that we wouldn't add EvalSymlinks today if it didn't exist; so the best long term solution is deprecating EvalSymlinks.
We cannot fix EvalSymlinks because certain changes from relative to absolute paths could break existing Windows programs. Alex will not agree to that, and he's the Windows maintainer.
@ericwj did a careful analysis of EvalSymlinks on Windows in #40180 (read whole thread)
this proposal, as it stands today, is about adding a new API, filepath.Resolve, which still violates everything mentioned in #40180. #40180 is fundamentally a software engineering problem: @ericwj is saying that the documentation should mention that the function should be rarely used. The same would apply to filepath.Resolve if it's added.
We cannot fix EvalSymlinks because certain changes from relative to absolute paths could break existing Windows programs.
Can you point me to a real-world example? It would be useful while I write the CL. As I said, I plan to go through the history of changes of EvalSymlinks, reconstruct all rationales, make sure there are regression tests, and then attempt a fix. If something is not covered by the existing tests and history of changes, I'd be happy to evaluate it.
saying that the documentation should ...
I gather you didn't read that whole thread.
Can you point me to a real-world example?
Pathnames over 250ish characters must be absolute. EvalSymlinks doesn't respect this, and correcting it would break callers expecting a relative path.
If you want to improve Go on Windows, there are a wide variety of other bugs to tackle. This issue has been resolved.
@networkimprov can you provide a link to a real world application or at least a working, compilable code snippet that would break if EvalSymlinks is fixed as originally proposed by Russ?
@rasky, I see two problems:
EvalSymlinks has Unix-specific behavior as far as returning absolute vs relative paths. If you have a symlink x->y and you EvalSymlinks("x"), you get "y", but if you have a symlink x->/tmp/y, you get "/tmp/y". This is correct enough for Unix but much more difficult on Windows, and on both systems sometimes you just want a canonical absolute path. Depending on the symlink is surprising, even on Unix. Note that realpath(3) always returns an absolute path. That's what Resolve will do too.
On Windows, we defined "with a drive letter" as "absolute" but even that's not correct. Drive letters are not absolute in the same way as the Unix root. Instead, Windows has introduced \\ paths that are a better definition of absolute there.
Concretely, supposing that M: and N: map to the same network drive (\\host\share\), then EvalSymlinks(`M:\`) = `M:\` and EvalSymlinks(`N:\`), so that EvalSymlinks(`M:\`) != EvalSymlinks(`N:\`) even though they are actually the same.
In contrast, in that case, Resolve(`M:\`) == Resolve(`N:\`) == `\\host\share`.
In both cases, EvalSymlinks has behavior that is difficult to change but is less than ideal. Adding Resolve lets us introduce a different operation that behaves like realpath(3) and GetFinalPathNameByHandle instead of trying to shoehorn those into EvalSymlinks. EvalSymlinks still does exactly what it says - it evaluates symlinks - and people who want exactly that behavior can still use it for that. But people also sometimes want "get me a fully resolved, canonical equivalent of this path". It makes sense to provide that separately.
Based on the discussion above, adding Resolve with the semantics described seems like a likely accept.
Russ,
I disagree. I don't think we should add filepath.Resolve function that behaves like GetFinalPathNameByHandle Windows API.
If someone wants to use this API, we can add golang.org/x/sys/windows.GetFinalPathNameByHandle and they can use it there.
I still don't understand what proposed filepath.Resolve function will do. It will be even harder for others, who are just learning this package, to understand the difference between filepath.Resolve and filepath.EvaluateSymlinks. I would like to see documentation for that function before this proposal is even considered.
filepath.Resolve is not a good name (like utils). It tells you nothing about what the function does. To me it is a good indication that you yourself do not know what this function does.
And what problem will this new function solve? Is filepath.Resolve is supposed to be a replacement (good version) for filepath.EvaluateSymlinks? If yes, let's have someone actually try and replace filepath.EvaluateSymlinks in the current Go tree and see if it still works. If that does not work, then we need to think even harder why we need filepath.Resolve, and not just golang.org/x/sys/windows.GetFinalPathNameByHandle.
Thank you for consideration.
Alex
PS: Sorry I did not comment earlier. But I have been busy with other staff, and missed this thread altogether.
@alexbrainman all of the questions you raised have been addressed in the comments above. Re missing the thread, you last commented here 7 days ago.
@ericwj is unable to comment on this issue. As far as we can tell, this is a GitHub bug. It is not due to any intentional action by the Go team.
This comment is by @ericwj, sent via e-mail:
GetFinalPathNameByHandledoes not canonicalize. I just ran it, trying all flags. The result is different depending on how I configure my system, depending on which path I use to access the file and depending on which flag I provide in each situation. This means canonicalization will break if the machine configuration changes, which can also happen during Git operations, and the result may change depending on which path is used to access the same file or directory.The simplicity of
EvalSymlinksis one of its problems. Any functionstring Foo(string)will have some of the same fundamental, unfixable problems that I documented forEvalSymlinks. Issue #40180 only proposes docs fixes because outright no-op'ingEvalSymlinkson Windows was refused in #40104, but I think the conclusion of the analysis in #40180 is that #40104 was refused because I did not argue the case effectively yet. And this proposal is just too similar, apart from perhaps thread-safety and eventually perhaps looking more often like it works.Even suppose something like
Resolvecould be perfect, Go cannot handle the return value. The unfortunate truth is that Go can only handle paths with drive letters and ‘normal’ UNC paths in the form\\server\share(without prefix). I have evaluated about half of allpath/filepathAPI’s and the prefixesGetFinalPathNameByHandleyields break about every one of them. You cannot depend on stripping the prefixes, either.This proposal goes wrong the same way
path/filepathhas, which imho is thinking *nix can be ported unmodified to a fundamentally different operating system without proper attention, expertise involved, or testing. And for letting fundamentals slide for the benefit of a bit of simplicity, offering no alternative besides P/Invoke in cases where this makes things fall apart.I suggest writing the fundamentals, especially well for Windows, testing and reviewing meticulously, while iterating towards an API suitable for the general public on top of those fundamentals, before proposing inclusion in any standard library. If the problem is just UNC mappings, much more appropriate
WNet*functions exist to translate DOS device names. I think object identifiers are the more general solution for Git LFS, in favour of path strings. Wrapped in a suitable, portable abstraction to obtain and compare them.
@rasky, I see two problems:
- EvalSymlinks has Unix-specific behavior as far as returning absolute vs relative paths. If you have a symlink x->y and you EvalSymlinks("x"), you get "y", but if you have a symlink x->/tmp/y, you get "/tmp/y". This is correct enough for Unix but much more difficult on Windows, and on both systems sometimes you just want a canonical absolute path. Depending on the symlink is surprising, even on Unix. Note that realpath(3) always returns an absolute path. That's what Resolve will do too.
Yes, I agree that we cannot exactly reproduce this if we switch EvalSymlinks to GetFinalPathNameByHandle. Though I'm not fully sure that the clients of EvalSymlinks really do behave on this implementation detail (that is, unfortunately, documented). What if drop that sentence from the documentation, and just say that EvalSymlinks can either return a relative or an absolute path? Do you have an idea on whether clients really do rely on this specific behavior, even on Linux?
- On Windows, we defined "with a drive letter" as "absolute" but even that's not correct. Drive letters are not absolute in the same way as the Unix root. Instead, Windows has introduced
\\paths that are a better definition of absolute there.
Concretely, supposing thatM:andN:map to the same network drive (\\host\share\), thenEvalSymlinks(`M:\`) = `M:\`andEvalSymlinks(`N:\`), so thatEvalSymlinks(`M:\`) != EvalSymlinks(`N:\`)even though they are actually the same.
In contrast, in that case,Resolve(`M:\`) == Resolve(`N:\`) == `\\host\share`.
Notice that realpath on Linux doesn't follow mountpoints as well:
# mkdir a
# mkdir b
# mount --bind a b
# cd a
# touch foo
# cd ..
# realpath a/foo
/tmp/a/foo
# realpath b/foo
/tmp/b/foo
Is realpath(3) on Linux different than filepath.Abs(filepath.EvalSymlinks(path))? I might be missing something here.
In both cases, EvalSymlinks has behavior that is difficult to change but is less than ideal. Adding Resolve lets us introduce a different operation that behaves like realpath(3) and GetFinalPathNameByHandle instead of trying to shoehorn those into EvalSymlinks. EvalSymlinks still does exactly what it says - it evaluates symlinks - and people who want exactly that behavior can still use it for that. But people also sometimes want "get me a fully resolved, canonical equivalent of this path". It makes sense to provide that separately.
So my understanding is that if this proposal is accepted, we end up with:
Resolve and EvalSymlinks are basically the same function, with the only difference is that Resolve always returns an absolute path, while EvalSymlinks returns a path constructed following the symlinks, so it might be absolute or relative depending on how the symlinks were created. Neither of them see through mountpoints in any way.EvalSymlinks (after bugfixing) would possibly go through all kind of NTFS links (with the same relative/absolute handling than Linux), and it would not follow mountpoints. Resolve instead would also resolve mountpoints.It looks like the biggest gain here is that we somehow exposes GetFinalPathNameByHandle which is a "common" function used in other frameworks / languages to canonicalize paths. Since we're missing it, we face interoperability problems. In fact, semantically speaking, we're going to have a Resolve function which behaves differently between Windows and Linux wrt mountpoints. And BTW we're also getting into another corner: if somebody puts in the effort of adding mountpoint resolution to Resolve on Linux, we end up with interoperability problems again, because the rest of the world might still just be calling realpath(3) on Linux.
A slightly different proposal is to add filepath.SameFile(path1, path2 string). This would use GetFinalPathNameByHandle on Windows, and realpath on Linux (but can be further improved on Linux by looking at inode number, etc.). It would solve the real-world problem of many softwares without having to define what a canonical path name is, and would also relax the software engineering concern of preferring to keep around the original input path rather than a resolved one.
So my understanding is that if this proposal is accepted, we end up with:
* Linux: `Resolve` and `EvalSymlinks` are basically the same function, with the only difference is that Resolve always returns an absolute path, while `EvalSymlinks` returns a path constructed following the symlinks, so it might be absolute or relative depending on how the symlinks were created. Neither of them see through mountpoints in any way.
Yes, because bind mounts on Linux are not identical. One directory may be read-only while the other may be read-write. Bind mounts also, by default, don't include submounts and are therefore not necessarily equivalent.
* Windows: `EvalSymlinks` (after bugfixing) would possibly go through all kind of NTFS links (with the same relative/absolute handling than Linux), and it would not follow mountpoints. `Resolve` instead would also resolve mountpoints.
As far as I understand it, these mountpoints are functionally identical. They are more like opaque symlinks rather than the typical Unix mountpoints with different mount options.
It looks like the biggest gain here is that we somehow exposes GetFinalPathNameByHandle which is a "common" function used in other frameworks / languages to canonicalize paths. Since we're missing it, we face interoperability problems. In fact, semantically speaking, we're going to have a Resolve function which behaves differently between Windows and Linux wrt mountpoints. And BTW we're also getting into another corner: if somebody puts in the effort of adding mountpoint resolution to Resolve on Linux, we end up with interoperability problems again, because the rest of the world might still just be calling realpath(3) on Linux.
This is exactly the behavior that's desired: canonicalizing the path in the way the operating system does that. Languages that use the system C library don't have this problem because they use the system C library functions for this purpose and implement those functions' behavior. I would literally define the expected behavior in terms of the system's C library, since that's what other languages (e.g., Rust) do.
A slightly different proposal is to add
filepath.SameFile(path1, path2 string). This would use GetFinalPathNameByHandle on Windows, and realpath on Linux (but can be further improved on Linux by looking at inode number, etc.). It would solve the real-world problem of many softwares without having to define what a canonical path name is, and would also relax the software engineering concern of preferring to keep around the original input path rather than a resolved one.
That doesn't really assist in my use case of finding out whether one path is inside another.
Even suppose something like Resolve could be perfect, Go cannot handle the return value. The unfortunate truth is that Go can only handle paths with drive letters and ‘normal’ UNC paths in the form \server\share (without prefix). I have evaluated about half of all path/filepath API’s and the prefixes GetFinalPathNameByHandle yields break about every one of them. You cannot depend on stripping the prefixes, either.
This should mostly be fixed by my CL (https://go-review.googlesource.com/c/go/+/263538). Please test it and let me know.
It also rectifies some EvalSymlinks issues on Windows reported in your #40180 table. I have a followup CL (still not mailed) that fixes another set of issues in EvalSymlinks. NOTE: I'm not touching the semantic of the function as it is implemented now:
I'm just trying to fix implementation bugs so that we can see a clearer picture of where we are now.
There's clearly new, substantive discussion here. Moving back to Active.
all of the questions you raised have been addressed in the comments above. Re missing the thread, you last commented here 7 days ago.
@networkimprov I cannot find them. Can you, please, provide references to me? Thank you.
Alex
So my understanding is that if this proposal is accepted, we end up with:
* Linux: `Resolve` and `EvalSymlinks` are basically the same function, with the only difference is that Resolve always returns an absolute path, while `EvalSymlinks` returns a path constructed following the symlinks, so it might be absolute or relative depending on how the symlinks were created. Neither of them see through mountpoints in any way.Yes, because bind mounts on Linux are not identical. One directory may be read-only while the other may be read-write. Bind mounts also, by default, don't include submounts and are therefore not necessarily equivalent.
To be honest, I disagree on this. The two underlying files are indeed the same and in fact os.SameFile returns true. It sounds absolutely surprising that os.SameFile(a,b) differs from filepath.Resolve(a) == filepath.Resolve(b).
Moreover, the same situation happens on Windows. You can easily have C:\FOO and \\LOCALHOST\BAR\FOO being the same file with different permissions. I'm not sure why on Windows it should be necessary to have filepath.Resolve returning true in that situation, while on Linux it should return false on bind mounts (or any other kind of mount really).
I think the only correct answer here is that the request is not to establish a cross-platform semantic, but just add a function that calls realpath on Linux and GetFinalPathNameByHandle on Windows because this is what most open source projects have ended up doing. I sympathize with the goal of interoperability, but I'm just hoping that we can find a better solution.
A slightly different proposal is to add
filepath.SameFile(path1, path2 string). This would use GetFinalPathNameByHandle on Windows, and realpath on Linux (but can be further improved on Linux by looking at inode number, etc.). It would solve the real-world problem of many softwares without having to define what a canonical path name is, and would also relax the software engineering concern of preferring to keep around the original input path rather than a resolved one.That doesn't really assist in my use case of finding out whether one path is inside another.
Looking at https://github.com/git-lfs/git-lfs/issues/4012, it looked like filepath.SameFile would be enough, to compare if the git root actually matches. From your answer, I understand that you actually need whether a specific file of the repo is within the git repo (with the git root path being read through git and thus canonicalized through GetFinalPathNameByHandle)? Can you please elaborate on the use case you are mentioning?
I have one question on this. Say that we have \\network\repo mounted as N:, just like in the referenced bug. Within the repo, there's a committed symlink N:\mylink that points to a file on a local disk C:\file. Would you expect filepath.Resolve("N:\\mylink") to return \\network\repo\mylink or C:\file? If I followed you correctly, you would need the former, but GetFinalPathNameByHandle obviously returns the latter.
I think the only correct answer here is that the request is not to establish a cross-platform semantic, but just add a function that calls
realpathon Linux andGetFinalPathNameByHandleon Windows because this is what most open source projects have ended up doing. I sympathize with the goal of interoperability, but I'm just hoping that we can find a better solution.
This is what Git does and what other projects do, and it's the semantics we need. It is the same semantics that other programming languages provide. I am interested in Go working with other software that is also on the system in a compatible way.
You may argue that Windows and Unix should provide the same semantics for path canonicalization, and perhaps they should. However, Microsoft was fully aware of Unix path semantics and behavior, and clearly chose deliberately to do things in an incompatible and much more complicated way. I feel that was a mistake, but we're stuck with it now.
Looking at git-lfs/git-lfs#4012, it looked like
filepath.SameFilewould be enough, to compare if the git root actually matches. From your answer, I understand that you actually need whether a specific file of the repo is within the git repo (with the git root path being read through git and thus canonicalized throughGetFinalPathNameByHandle)? Can you please elaborate on the use case you are mentioning?
I need to change the directory to the root of the repository to make GIT_WORK_TREE work. I also need to be able to resolve any other Git environment variables to their canonical paths so I can set them appropriately before invoking Git, and all of those need to be canonicalized appropriately and compatibly with Git. This is also the behavior that anyone else working with Git will need to implement.
I also need to be able to determine if a path the user has provided is within the repository and fail if it is not. I can do that by removing the trailing file component, canonicalizing the directory, and appending the file path. That works to tell me if the component is in the repository, whether or not the file itself is a symlink. Repositories are not allowed to span file systems.
I have one question on this. Say that we have
\\network\repomounted asN:, just like in the referenced bug. Within the repo, there's a committed symlinkN:\mylinkthat points to a file on a local diskC:\file. Would you expectfilepath.Resolve("N:\\mylink")to return\\network\repo\mylinkorC:\file? If I followed you correctly, you would need the former, butGetFinalPathNameByHandleobviously returns the latter.
I don't need to canonicalize file names, only directory names. However, I do need the standard GetFinalPathNameByHandle semantics for that. Other people may have different use cases and will also need the standard semantics.
I want to be clear that I'm not asking for this just for Git LFS. I'm asking for this because these are the standard platform semantics and they should be available in a cross-platform way. I fully realize that they are inconsistent across platforms. I want the same behavior out of Go that I get out of languages that use the system C library. I'm not looking for special functionality to handle my special case, I'm looking for generic tools that make Go programs compatible with non-Go programs.
It's fine that Go does not wrap the system C library, but if it chooses not to do that, it needs to be compatible with the semantics of the platform as if it did, and not provide behavior that is materially different. Trying to sell me on a behavior that is consistent across platforms when that is not practically achievable in a useful way isn't helpful. That's what we have now with EvalSymlinks, and everyone agrees that it's not the right decision.
I want to be clear that while there may be other proposals, such as filepath.SameFile, those are separate and independent proposals and are outside of the scope of this proposal, which is to implement a function which canonicalizes paths. I don't dispute that they may be valuable and important proposals, but they are not this one, and as such, they should probably be discussed in their own issues, not this one.
I also want to be clear that I'm an experienced developer who understands his problem space well and am confident I know what's best for it, so it's unhelpful (and frankly, insulting) to imply that some other solution would be better for my needs or that I should adopt a different design. If people disagree with the Git project's decisions to adopt this technique (with which I am implementing compatibility) and feel that they were unwise, those concerns should be addressed to the project's mailing list, although I suspect that in this case such a message will be about as welcome as I have found it.
As I've mentioned, my goal here is to expose a cross-platform way to canonicalize paths as the operating system defines that. Whether the operating system designers made a provident design decision or whether the operating system has provided desirable semantics is outside the scope of this proposal. I assume for the purposes of this proposal that the fact that the operating system is successful means that its developers were competent and implemented desirable, useful, and well-thought-out features. The facts remain that other common, successful languages implement functionally equivalent designs, and that while similar concerns have been expressed during the design of those implementations, nobody has yet proposed a better or more universally desirable solution. Moreover, Go attempted to provide a solution for this problem in EvalSymlinks which demonstrates that this is a problem that people need solved. Additionally, canonicalizing paths is considered useful for security purposes by Carnegie Mellon's Software Engineering Institute, which is considered a reputable source of secure coding advice. Finally, this functionality is implemented in the popular realpath utility, which is widely shipped for scripting across platforms (including Windows).
For these reasons, I believe that the proposal as it stands should be implemented. I take no position on proposals for other functions, as they are outside my scope.
Brian, the Windows maintainer has objected to this proposal, raising Q's in https://github.com/golang/go/issues/37113#issuecomment-712704339. It might help to point out where Russ or Ian have previously addressed those Q's.
Looking at how this problem is solved elsewhere, if a new function is to be added, perhaps a better name would be Canonical(). Rust has canonicalize and C++'s filesystem library has canonical, which Microsoft's Standard C++ Library implements using GetFinalPathNameByHandle.
Canonical was not immediately obvious to me before I looked at this issue, and I think that's a fair concern. However, looking at modern solutions to this problem, once you know of it, the naming familiarity between languages might be a good thing.
@ericwj is unable to comment on this issue. As far as we can tell, this is a GitHub bug. It is not due to any intentional action by the Go team.
@ianlancetaylor The original author of this issue has blocked @ericwj because they felt insulted by the help they were providing. When you block a user on GitHub, they're unable to comment on issues/pull requests you're the owner of.
@saracen Thanks for the explanation.
@bk2204 Would you be willing to unblock @ericwj? I think they have useful information to contribute for moving this proposal forward. Thanks.
I do believe people have the right to decide who they'd like to block, assuming they are not in a government position, and that asking people to unblock others is inappropriate. That's often an approach people use to engage in continued harassment of women and other folks who tend to be underrepresented in tech. That's not what's happening here, and I'm sure you had only good intent to move the discussion forward in a productive way, but since we often lack all the context to know people's reasons for blocking others, it's generally best not to ask.
For the limited purpose of advancing this proposal, I have unblocked @ericwj for the moment.
I'm fine with Canonical() or Canonicalize() as a name since it's used elsewhere. That may be more intuitive than Resolve(), but since we'll have sane documentation either way, I'm not very picky about the name.
@bk2204 My sincere apologies. I was in the wrong, and I've done you harm. I'm sorry.
@ericwj Your comment violates the Go Community Code of Conduct (https://golang.org/conduct), and I've deleted it.
the Windows maintainer has objected to this proposa
@networkimprov I assume you refer to me as the Windows maintainer. That is not my title. Please stop inventing things. You give false impression to others.
Thank you.
Alex
Alex, sorry, I had gathered that you're recognized as the principle Windows maintainer. I didn't mean to invent a title.
Well, I'm sorry for having been slightly edgy due to the total irrelevance of that statement with which he accompanies the news that he unblocked me.
I care to repeat that Brian should argue his case from a position of hands on, mature knowledge rather than having us collect that for him and demanding his proposal be accepted unmodified and that none of us is in a better position than to test and evaluate the suitability of including the proposal of porting canonicalize unmodified into the go standard library for Windows than he himself and the company he works for. Further I can only repeat my previous arguments and fix the oversight of not mentioning the obvious - that EvalSymlinks is implemented with GetFinalPathNameByHandle and that some of the problems even mentioned in this issue will be exactly the same with canonicalize for this reason.
I'm going to close this issue and I'd like to ask the core team to lock it at their earliest convenience. I don't feel this discussion is going in a productive way and I don't wish to pursue this issue or proposal further. If other folks think this a similar proposal will be valuable, I'm happy for them to drive it without my involvement.
Alex, sorry,
No worries. You did not hurt my feelings or anything.
I just did not want other people to get impression that they need to convince me more than others. Any proposal here should be judged purely on its merits, not on authority of proposal author or judges.
I had gathered that you're recognized as the principle Windows maintainer.
I am Go contributor, like many others. I gathered bunch of experience over the years. But that does not make me principle Windows maintainer. I am not responsible for Windows port - Go Team is.
I didn't mean to invent a title.
No worries. No harm done.
Alex
I have written a new proposal, #42201, to replace this one.
I will now lock this issue.
Most helpful comment
It looks like
filepath.EvalSymlinksis broken on Windows, and we are now planning to replace it with a similar function with a different name and slightly different semantics, because it's too hard to touchEvalSymlinkson Windows, where it's already broken and doesn't always work. Nobody can enumerate where it does work or it doesn't, but it seems easier not to touch it anymore rather than fixing it.This sounds extremely rushed to me. We're adding an API that is basically a duplicate of another existing one because we don't have time to properly fix the existing one, analyzing all issues. Also it looks like there are no volunteers to work on it.
If we can get this proposal on hold for a month or so, I can attempt a thorough analysis on the history of changes of EvalSymlinks on Windows, and attempt a CL as described by Russ in https://github.com/golang/go/issues/37113#issuecomment-701509206. This would require some time and some effort to get right, but I think it would be a better long term solution rather than adding a new API. If I fail, we can still fallback to adding a new API, but at least I hope we would get a better understanding around EvalSymlinks and whether it's really inherently broken forever on Windows.