go version)?$ go version go version go1.11.2 windows/amd64
Yes
go env)?go env Output
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\wir3less\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\wir3less\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\wir3less\AppData\Local\Temp\go-build829182294=/tmp/go-build -gno-record-gcc-switches
While playing around with URL.Parse I found a few problems I'd like to share.
I'll gladly share more details if anything is unclear or if someone is interested.
Normally, javascript:alert(1) when parsed by url.parse has no Hostname()
But javascript://alert(1) has a hostname of alert(1)
This can be taken further...
javascript://%250aalert(1)+'[email protected]/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)
IPV6 support also has it's issues...
this URL http://[google.com]:80 has the hostname of google.com
But also do all of these:
http://google.com]:80
http://google.com]:80__Anything_you'd_like_sir
http://[google.com]FreeTextZoneHere]:80
Even without thinking about how this would interact with other systems and parsers,
Just considering code used URL hostname validations and Go's https functions (http.Get() for instance) leveraging url.parse should explain how this could be used maliciously.
Again, will be glad to provide more details if needed.
All POCs can be found here
https://play.golang.org/p/UoqEcxCFY8z
Errors for most of it...
Hostnames
/cc @bradfitz
What did you see instead?
Hostnames
I think an invalid error is more appropriate.
I agree. Thats what I wrote on the "Expected" field.
Recieving a Hostname is what currently happens.
I've take a look into url_test.go
You can see this is design on purpose.
I see...
Well this is still the wrong behaviour, both in my opinion as well as by the spec.
The fact that we have tests for it doesn't make it right...
Try testing other parsers, non will allow that kind of parsing
I don't know if this is the right place to mention this but url.Parse("localhost") sets the Path to "localhost" and the host is empty. I believe this is also an error in the parsing
@mees- Why would that be an error? You have provided a valid relative path.
I'm sorry, I misunderstood the documentation
Can I provide any more info to get this issue resolved? No point in leaving this issue open for nothing...
Try testing other parsers, non will allow that kind of parsing
If you can take each case, and show a comparison of the behavior in Python, Node and Ruby, it will help us understand the situation better.
So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharing
You can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)
For the IPv6 URLs, you can see that Go is by far the most permissive parser.
Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.
I'm also going to open bugs for Node and Ruby as they are also in risk of that.
Thank you for spending time on this !
I dug a little deeper on the Node cases.
I used node 10.13.0 for my testing, apparently, there's already 10.15.0
One of the patches is of interest to this issue - https://nodejs.org/en/blog/vulnerability/november-2018-security-releases/#hostname-spoofing-in-url-parser-for-javascript-protocol-cve-2018-12123
So using url.parse() is fixed on Node now, these problems still exist it if you use the new URL() syntax
> new URL("javascript://google.com").host
'google.com'
> url.parse("javascript://google.com").host
null
Guys,
Any update here?
javascript://alert(1) has a hostname of alert(1)
javascript://%250aalert(1)+'[email protected]/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)
I don't see an issue here. If you're using unescaped and unsanitized data in HTML code, then nothing would help you. And whitelisting is not an option here, while blacklisting isn't a solution as it always can be bypassed by a malicious.actor.
Thanks @bradfitz .
@opennota - The URL parser is meant to deal with such data.
You're supposed to be able to pass user-data into it, and it should return an error in case the URL is malformed.
Otherwise, it should return a safe object that correctly represents the URL. Meaning, for example, no Hostname for hostless schemes such as JS.
As for the blacklist approach, I think a piece of code such as the one implemented by Node, should do the trick:
https://github.com/nodejs/node/blob/master/lib/url.js#L277
@bradfitz I just realized that go releases a major version every 6 months, and that 1.13 will be released on August 19.
Does that mean this issue will wait till then to get resolved?
At least, if ever. We don't have a great history of being able to make changes to the URL type without breaking code. So it might need to remain unchanged (or at least only documented). But I can't say because I haven't looked into this bug at all yet or your spreadsheet. But thanks for gathering data.
Change https://golang.org/cl/189258 mentions this issue: net/url: make Hostname and Port predictable for invalid Host values
Glad to see this is getting fixed, I'm be happy to assist or just verify the fix if you'd like.
@wir3less We'd definitely appreciate it if you could verify the fix. For a security patch we needed to reduce disruption as much as possible, so we focused on protecting against validation bypasses, rather than rejecting all invalid hostnames (which will come in Go 1.14 as much as possible).
If you can still see validation bypass avenues, please let us know. Also, please let us know how you'd like to be credited for reporting this, if at all.
For next time, this kind of issue is worth reporting to [email protected] rather than the issue tracker.
Great, and I completely understand. I'll go over the patch and let you know if I find anything interesting (via the security email this time :) ).
As for credit, the following would be great
Adi Cohen - adico.me
Thanks
/cc @mikesamuel who might also be interested
@gopherbot Please backport this to 1.12 and 1.11. This was a security problem.
Backport issue(s) opened: #33632 (for 1.11), #33633 (for 1.12).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
Is net/url based on STD66 or https://url.spec.whatwg.org/ ?
@mikesamuel STD66. It cites RFC 3986 and RFC 2396. Is it relevant here? Do either allow non-numeric ports?
@FiloSottile
I don't know that they differ around ports.
IIRC, url.spec does differ around javascript://example.com/%0aalert(1).
The Appendix B regex is perfectly happy with non-numeric ports, so it really depends on whether you're strictly matching the grammar or doing approximate decomposition.
Change https://golang.org/cl/191966 mentions this issue: net/url: fail TestParseErrors test when getting an unwanted error
So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharingYou can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)
For the IPv6 URLs, you can see that Go is by far the most permissive parser.
Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.
I'm also going to open bugs for Node and Ruby as they are also in risk of that.
Python also has the behavior (Python 3.6.4 (default, Apr 4 2019, 19:58:46))
>>> from urllib.parse import urlparse
>>> urlparse("javascript://%250aalert(1)+'[email protected]/a'a")
ParseResult(scheme='javascript', netloc="%250aalert(1)+'[email protected]", path="/a'a", params='', query='', fragment='')
>>> a = _
>>> a.hostname
'google.com'
>>> a.username
"%250aalert(1)+'aa"
But in my opinion, this behavior (javascript://%250aalert(1)+'[email protected]/a') isn't a security bug. Developer should filter XSS while writing to HTML.
And somthing following the indicator of hierarchical URL is authentication and hostname, even if the URL is non-hierarchical, parser dont need to confirm which protocols are hierarchical.
The following custom URI was accepted before, but isn't now:
s://1:2.3
A port is extracted as 2.3, which is not a valid port, but our scheme does not define the concept of a port.
The following custom URI was accepted before, but isn't now:
s://1:2.3
A port is extracted as 2.3, which is not a valid port, but our scheme does not define the concept of a port.
As far as I can tell, that's not a valid URL according to any spec. I realize it can be useful to use the URL parser for similar formats, but that convenience comes with the security cost of accepting invalid URLs that are surprising to applications that expect well formed URLs, and the name of the package is net/url.
Fair enough, I know that is not a URL.
Would you say "file://1:2.3" should fail parsing too?
Would you say "file://1:2.3" should fail parsing too?
I think it should, as the correct format for it is file:///1:2.3, where the host is empty. Does it not?
I see. I forgot about the third / meaning the part after that is treated as a path, so that wasn't a valid example.
My issue is with the assumption that any hostname that is not an IP will end with a port number.
"http://host:sub" doesn't parse either. I couldn't find if the RFC3986 mandates that the second part of any host string must be a port, but if it is I'll have to live with it. If it isn't I would prefer that the behavior of parse gets relaxed and just return the entire host name string as is.
If there is a colon in the authority, the part after it is a port, and ports can only be digits.
authority = [ userinfo "@" ] host [ ":" port ]
port = *DIGIT
https://tools.ietf.org/html/rfc3986#section-3.2
(Of course, the port is optional, but if there is a colon there is a port, by the syntax.)
Most helpful comment
Thank you for spending time on this !