System.Uri cannot parse trailing $ sign in paths
We want to store and load data from WSL2 rootfs projection, which is exposed into Windows as "\\wsl$", we use this UNC path internally, to construct and manipulate the path. However, System.Uri cannot parse "\\wsl$" and throws a format exception:
System.UriFormatException: 'Invalid URI: The hostname could not be parsed.'
var uri = new System.Uri(@"\\wsl$");
Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.
I don't think I've ever seen that format of URL before. @MihaZupan do we have a validation bug?
This seems like something we'll want to support; I recommend we triage this into the 5.0 milestone.
https://devblogs.microsoft.com/commandline/whats-new-for-wsl-in-windows-10-version-1903/
Why is the WSL resource name in the filepath called wsl$? Since wsl is a short acronym we realize that some resources on networks may already have that name. So we鈥檝e added a dollar sign, since a machine name can鈥檛 have a dollar sign in it, which ensures that the name will be accessible with any existing network configuration.
Seems we already have special hostname validation just for UNC's. https://source.dot.net/#System.Private.Uri/System/UncNameHelper.cs,68
This kind of URLs have been around for a long time:
I'm surprised that they were never parsed by System.Uri
.
This is an issue with that part of a UNC path being the host - $
in the host name will not be recognized.
The validation fails in IsValidDomainLabelCharacter
for http Uris or UncNameHelper.IsValid
for UNC paths.
This stems from the fact that System.Uri
validates hostnames as a Server-based Naming Authority
for DNS resolution and not by the less-strict, generic Registry-based Naming Authority
rules as defined in RFC 2396 and 3986.
So, should \\.\C$
be valid?
It won't be becuase .
alone is not a valid host either.
$
itself in the path is not problematic - for example \\foo\C$
works fine.
@itodirel How come you are using System.Uri
in this case over System.IO.Path
, as it sounds like you are making requests to the file system?
I'm not going to speak for @itodirel, but I've built systems in the past that used URIs to specify actions.
Relative URIs were also used to add/override parameters.
Regarding the question about why using System.Uri. It isn't my choice, and I can't change it easily. The dependency exists in MSBuild today, MSBuild will fail to load/build projects, because it uses System.Uri internally when it tries to build a project located on \\wsl$. CPS will fail to load projects, because internally it uses MSBuild for the evaluation, and for the same reason MSBuild uses System.Uri. Same with Visual Studio, the New Project Dialog, takes a dependency on System.Uri, and fails to create projects on \\wsl$. Do we want to tell all these folk to not use System.Uri? That would be a massive change, in code that might not been changed or touched for 20 years, or in legacy code, it can be done, with effort, but I'm trying to understand why can't System.Uri support it? Is it a bug in System.Uri?
It might be reasonable to relax the hostname validation into just reg-name
, at least for the file schema. I tried finding some sort of standard Windows path -> URI transform recommendation but can't see anything.
The ramification being that it would cause code to fail later rather than sooner if they try to feed a file host into a resolver, but that seems like a rare scenario?
Is there a specification for UNC file names and allowed characters? We should likely comply with that. If $ is allowed, then it is technically a bug.
A thing to consider for perspective is that it is a bug existing for 20 years -- either no one uses System.Uri with weird hostnames, or usage of $ in hostnames is extremely rare ... doesn't make it less severe for your scenario, but explains that it is a corner case not hit so far :(
In this case, the $ is not part of the hostname. It's part of the path.
`\hostname\share$' works:
AbsolutePath : /share$
AbsoluteUri : file://hostname/share$
LocalPath : \\hostname\share$
Authority : hostname
HostNameType : Dns
IsDefaultPort : True
IsFile : True
IsLoopback : False
PathAndQuery : /share$
Segments : {/, share$}
IsUnc : True
Host : hostname
Port : -1
Query :
Fragment :
Scheme : file
OriginalString : \\hostname\share$
DnsSafeHost : hostname
IdnHost : hostname
IsAbsoluteUri : True
UserEscaped : False
UserInfo :
Also, new Uri(@"\\share$", UriKind.Relative)
kind of works:
AbsolutePath :
AbsoluteUri :
LocalPath :
Authority :
HostNameType :
IsDefaultPort :
IsFile :
IsLoopback :
PathAndQuery :
Segments :
IsUnc :
Host :
Port :
Query :
Fragment :
Scheme :
OriginalString : \\share$
DnsSafeHost :
IdnHost :
IsAbsoluteUri : False
UserEscaped : False
UserInfo :
I am confused. I understood the original report as \\wsl$
is a hostname. (likely a fake one like localhost, but a hostname) Did I misunderstand?
If I did, what is it then?
I am confused. I understood the original report as '\wsl$' is a hostname. (likely a fake one like localhost, but a hostname) Did I misunderstand?
new Uri("\\wsl$")
parses as an absolute URI, so it interprets the wsl$
as a hostname.
new Uri("\\wsl$", UriKind.Relative)
parses as a relative, so it interprets the wsl$
as part of the path/query.
Please note that in some examples above, the "\\..."
is not C# string-encoded, so people probably mean @"\\..."
.
Also worth noting that \\wsl$
is not a valid URI at all, so I don't think that on the face of it making System.Uri work with it is critical. I am worried about potential regressions in other System.Uri cases.
Definition of URI from the RFC: https://tools.ietf.org/html/rfc3986#section-3
Syntax Components
The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
So while it may make sense to fix this one case of a $
in a UNC path, I'm worried that this is a slippery slope.
I would suggest instead that a new factory method Uri.FromUnc(string uncPath)
be added, which supports UNC paths and generates a file://
URI from it. Then MSBuild and other consumers can call this API. Or even a fancier (and far more dangerous API): Uri.MakeThisArbitraryStringIntoAUriPlease(string randomString)
.
As noted offline, before going to far we probably want to understand whether this would help @itodirel anyway, given any change would almost surely only be in .NET Core. (He mentioned Visual Studio which still runs on .NET Framework.)
Also, new Uri("\share$", UriKind.Relative) kind of works
Yes, but relative is more of a Uri string wrapper that doesn't provide much utility.
Also worth noting that \\wsl$ is not a valid URI at all
It's not valid because of the $ being in the host - System.Uri
supports implicit file as input, so \\wsl\foo
without the explicit file scheme is fine.
If you don't think about it being in an implicit file form though, the format itself is not against the Uri spec - it's just a question of what kind of hostname rules are used. See https://github.com/dotnet/runtime/issues/36595#issuecomment-630066238
Or even a fancier (and far more dangerous API): Uri.MakeThisArbitraryStringIntoAUriPlease(string randomString)
I am against this. UriKind.Relative
is kind-of this already - a string wrapper with no utility. If a string can't be parsed and understood by Uri, it can't provide info with properties either.
If we decide this is something we would like to support, the question of allowing $
in the host opens up discussion for whether we should support Registry-based Naming Authority
in its entirety (that would include more characters and percent-encoding) or just make an exception for $
.
I am against this.
I am too 馃槃
Just a note that most of MSBuild's dependencies on this are inherited from System.Xml, like XmlReader.BaseURI
returning a string like file:////wsl$/Ubuntu/home/raines/msbuild/src/Build/Microsoft.Build.csproj
that we want to translate back into a local path (which historically used new Uri(reader.BaseURI).LocalPath
).
A fix would only systematically help VS if it was in Framework, though.
I know nothing of MSBuild's use case of this scenario, but XmlReader.BaseURI
is abstract, so maybe MSBuild can do some hackery to override that property and do something "smarter" (i.e. more correct) with UNC paths?
(I get that these types are usually created from factories, but those instances can maybe be wrapped in new custom XmlReader
types that delegate all behavior to the broken inner instance except this one property, where it will do the "smarter" thing.)
Probably it's what MSBuild is feeding to the Uri
s that needs to be changed.
We've discussed this in triage:
$
is not a valid character for DNS hostnames, a path like \\wsl$
technically violates the spec. We should seek guidance from the spec's maintainers regarding this.reg-name
syntax definition, it is System.Uri
that chooses to use a more-strict DNS validation for hostnames.\\wsl$
and explicit file:////wsl$
form). We don't see an upside to allowing it for other schemes (such as http).Would it be worth asking the Microsoft owners of WSL to consider changing their choice of character? I get the wsl$
is baked in now, and might have to keep being supported, but perhaps if they added another alias such as wsl_dollar
it would help alleviate headaches for people using libraries/APIs/frameworks that use more strict (and arguably correct) interpretations of various specs.
File shares ending in $
appear to be called "Administrative shares", and have been supported by Windows for decades: https://en.wikipedia.org/wiki/Administrative_share
@zivkan share names with $
are working correctly to our best knowledge. The name in question here is hostname where it is technically not allowed per spec, although wsl$
uses it that way.
@Eilon I'm meeting with WSL team on Friday, that's was actually one of my questions and asks to them, if anyone here wants to join us or come, please let me me know.
@zivkan share names with
$
are working correctly to our best knowledge. The name in question here is hostname where it is technically not allowed per spec, althoughwsl$
uses it that way.
Oops, sorry. Yes, I was totally mixed up.
@itodirel I will follow up with you offline on the meeting.
Based on offline discussion, WSL team is considering change of the name (as it violates spec). Closing for now, we can reopen and reconsider if things change in future.
Thanks @karelz. I have met with the WSL team, they are open to changing the share name to work with System.Uri, they have a candidate for a new name, and are working on telemetry to validate that the new name does not conflict with other names. I will provide more updates here once I have them.
Most helpful comment
Thanks @karelz. I have met with the WSL team, they are open to changing the share name to work with System.Uri, they have a candidate for a new name, and are working on telemetry to validate that the new name does not conflict with other names. I will provide more updates here once I have them.