Go: net/url: URL.Parse Multiple Parsing Issues

Created on 4 Dec 2018 · 36Comments · Source: golang/go

What version of Go are you using (`go version`)?

$ go version
go version go1.11.2 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output



$ go env

set GOARCH=amd64

set GOBIN=

set GOCACHE=C:\Users\wir3less\AppData\Local\go-build

set GOEXE=.exe

set GOFLAGS=

set GOHOSTARCH=amd64

set GOHOSTOS=windows

set GOOS=windows

set GOPATH=C:\Users\wir3less\go

set GOPROXY=

set GORACE=

set GOROOT=C:\Go

set GOTMPDIR=

set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64

set GCCGO=gccgo

set CC=gcc

set CXX=g++

set CGO_ENABLED=1

set GOMOD=

set CGO_CFLAGS=-g -O2

set CGO_CPPFLAGS=

set CGO_CXXFLAGS=-g -O2

set CGO_FFLAGS=-g -O2

set CGO_LDFLAGS=-g -O2

set PKG_CONFIG=pkg-config

set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\wir3less\AppData\Local\Temp\go-build829182294=/tmp/go-build -gno-record-gcc-switches

What did you do?

While playing around with URL.Parse I found a few problems I'd like to share.
I'll gladly share more details if anything is unclear or if someone is interested.

Normally, javascript:alert(1) when parsed by url.parse has no Hostname()
But javascript://alert(1) has a hostname of alert(1)
This can be taken further...
javascript://%250aalert(1)+'[email protected]/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

IPV6 support also has it's issues...
this URL http://[google.com]:80 has the hostname of google.com
But also do all of these:
http://google.com]:80
http://google.com]:80__Anything_you'd_like_sir
http://[google.com]FreeTextZoneHere]:80

Even without thinking about how this would interact with other systems and parsers,
Just considering code used URL hostname validations and Go's https functions (http.Get() for instance) leveraging url.parse should explain how this could be used maliciously.

Again, will be glad to provide more details if needed.
All POCs can be found here
https://play.golang.org/p/UoqEcxCFY8z

What did you expect to see?

Errors for most of it...

What did you see instead?

Hostnames

NeedsFix Security

Source

wir3less

👍1

Most helpful comment

Thank you for spending time on this !

agnivade on 1 Jan 2019

👍5

All 36 comments

/cc @bradfitz

agnivade on 5 Dec 2018

What did you see instead?
Hostnames

I think an invalid error is more appropriate.

mengzhuo on 5 Dec 2018

I agree. Thats what I wrote on the "Expected" field.
Recieving a Hostname is what currently happens.

wir3less on 5 Dec 2018

I've take a look into url_test.go

https://github.com/golang/go/blob/5e1727892bf36076ee45c417b56bba32801ffbd7/src/net/url/url_test.go#L423-L432

You can see this is design on purpose.

mengzhuo on 5 Dec 2018

I see...
Well this is still the wrong behaviour, both in my opinion as well as by the spec.
The fact that we have tests for it doesn't make it right...
Try testing other parsers, non will allow that kind of parsing

wir3less on 6 Dec 2018

I don't know if this is the right place to mention this but url.Parse("localhost") sets the Path to "localhost" and the host is empty. I believe this is also an error in the parsing

mees- on 27 Dec 2018

👎4

@mees- Why would that be an error? You have provided a valid relative path.

fraenkel on 27 Dec 2018

I'm sorry, I misunderstood the documentation

mees- on 27 Dec 2018

Can I provide any more info to get this issue resolved? No point in leaving this issue open for nothing...

wir3less on 28 Dec 2018

Try testing other parsers, non will allow that kind of parsing

If you can take each case, and show a comparison of the behavior in Python, Node and Ruby, it will help us understand the situation better.

agnivade on 29 Dec 2018

So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharing

You can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)

For the IPv6 URLs, you can see that Go is by far the most permissive parser.

Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.

I'm also going to open bugs for Node and Ruby as they are also in risk of that.

wir3less on 1 Jan 2019

👍1

Thank you for spending time on this !

agnivade on 1 Jan 2019

👍5

I dug a little deeper on the Node cases.
I used node 10.13.0 for my testing, apparently, there's already 10.15.0
One of the patches is of interest to this issue - https://nodejs.org/en/blog/vulnerability/november-2018-security-releases/#hostname-spoofing-in-url-parser-for-javascript-protocol-cve-2018-12123

So using url.parse() is fixed on Node now, these problems still exist it if you use the new URL() syntax

> new URL("javascript://google.com").host
'google.com'
> url.parse("javascript://google.com").host
null

wir3less on 1 Jan 2019

Guys,
Any update here?

wir3less on 9 Jan 2019

   javascript://alert(1) has a hostname of alert(1)
   javascript://%250aalert(1)+'[email protected]/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

I don't see an issue here. If you're using unescaped and unsanitized data in HTML code, then nothing would help you. And whitelisting is not an option here, while blacklisting isn't a solution as it always can be bypassed by a malicious.actor.

opennota on 10 Jan 2019

Thanks @bradfitz .

@opennota - The URL parser is meant to deal with such data.
You're supposed to be able to pass user-data into it, and it should return an error in case the URL is malformed.
Otherwise, it should return a safe object that correctly represents the URL. Meaning, for example, no Hostname for hostless schemes such as JS.

As for the blacklist approach, I think a piece of code such as the one implemented by Node, should do the trick:
https://github.com/nodejs/node/blob/master/lib/url.js#L277

wir3less on 10 Jan 2019

@bradfitz I just realized that go releases a major version every 6 months, and that 1.13 will be released on August 19.
Does that mean this issue will wait till then to get resolved?

wir3less on 13 Jan 2019

At least, if ever. We don't have a great history of being able to make changes to the URL type without breaking code. So it might need to remain unchanged (or at least only documented). But I can't say because I haven't looked into this bug at all yet or your spreadsheet. But thanks for gathering data.

bradfitz on 14 Jan 2019

Change https://golang.org/cl/189258 mentions this issue: net/url: make Hostname and Port predictable for invalid Host values

gopherbot on 7 Aug 2019

Glad to see this is getting fixed, I'm be happy to assist or just verify the fix if you'd like.

wir3less on 9 Aug 2019

@wir3less We'd definitely appreciate it if you could verify the fix. For a security patch we needed to reduce disruption as much as possible, so we focused on protecting against validation bypasses, rather than rejecting all invalid hostnames (which will come in Go 1.14 as much as possible).

If you can still see validation bypass avenues, please let us know. Also, please let us know how you'd like to be credited for reporting this, if at all.

For next time, this kind of issue is worth reporting to [email protected] rather than the issue tracker.

FiloSottile on 9 Aug 2019

Great, and I completely understand. I'll go over the patch and let you know if I find anything interesting (via the security email this time :) ).

As for credit, the following would be great
Adi Cohen - adico.me

Thanks

wir3less on 9 Aug 2019

/cc @mikesamuel who might also be interested

odeke-em on 11 Aug 2019

@gopherbot Please backport this to 1.12 and 1.11. This was a security problem.

dmitshur on 13 Aug 2019

Backport issue(s) opened: #33632 (for 1.11), #33633 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot on 13 Aug 2019

Is net/url based on STD66 or https://url.spec.whatwg.org/ ?

mikesamuel on 14 Aug 2019

@mikesamuel STD66. It cites RFC 3986 and RFC 2396. Is it relevant here? Do either allow non-numeric ports?

FiloSottile on 14 Aug 2019

@FiloSottile

I don't know that they differ around ports.
IIRC, url.spec does differ around javascript://example.com/%0aalert(1).

The Appendix B regex is perfectly happy with non-numeric ports, so it really depends on whether you're strictly matching the grammar or doing approximate decomposition.

mikesamuel on 21 Aug 2019

Change https://golang.org/cl/191966 mentions this issue: net/url: fail TestParseErrors test when getting an unwanted error

gopherbot on 27 Aug 2019

So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharing

You can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)

For the IPv6 URLs, you can see that Go is by far the most permissive parser.

Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.

I'm also going to open bugs for Node and Ruby as they are also in risk of that.

Python also has the behavior (Python 3.6.4 (default, Apr 4 2019, 19:58:46))

>>> from urllib.parse import urlparse
>>> urlparse("javascript://%250aalert(1)+'[email protected]/a'a")
ParseResult(scheme='javascript', netloc="%250aalert(1)+'[email protected]", path="/a'a", params='', query='', fragment='')
>>> a = _ 
>>> a.hostname
'google.com'
>>> a.username
"%250aalert(1)+'aa"

But in my opinion, this behavior (javascript://%250aalert(1)+'[email protected]/a') isn't a security bug. Developer should filter XSS while writing to HTML.
And somthing following the indicator of hierarchical URL is authentication and hostname, even if the URL is non-hierarchical, parser dont need to confirm which protocols are hierarchical.

EddieIvan01 on 3 Sep 2019

The following custom URI was accepted before, but isn't now:
s://1:2.3
A port is extracted as 2.3, which is not a valid port, but our scheme does not define the concept of a port.

dwillemv on 2 Oct 2019

The following custom URI was accepted before, but isn't now:
s://1:2.3
A port is extracted as 2.3, which is not a valid port, but our scheme does not define the concept of a port.

As far as I can tell, that's not a valid URL according to any spec. I realize it can be useful to use the URL parser for similar formats, but that convenience comes with the security cost of accepting invalid URLs that are surprising to applications that expect well formed URLs, and the name of the package is net/url.

FiloSottile on 2 Oct 2019

👍1

Fair enough, I know that is not a URL.
Would you say "file://1:2.3" should fail parsing too?

dwillemv on 2 Oct 2019

Would you say "file://1:2.3" should fail parsing too?

I think it should, as the correct format for it is file:///1:2.3, where the host is empty. Does it not?

FiloSottile on 2 Oct 2019

I see. I forgot about the third / meaning the part after that is treated as a path, so that wasn't a valid example.
My issue is with the assumption that any hostname that is not an IP will end with a port number.

"http://host:sub" doesn't parse either. I couldn't find if the RFC3986 mandates that the second part of any host string must be a port, but if it is I'll have to live with it. If it isn't I would prefer that the behavior of parse gets relaxed and just return the entire host name string as is.

dwillemv on 2 Oct 2019

If there is a colon in the authority, the part after it is a port, and ports can only be digits.

  authority   = [ userinfo "@" ] host [ ":" port ]
  port        = *DIGIT

https://tools.ietf.org/html/rfc3986#section-3.2

(Of course, the port is optional, but if there is a colon there is a port, by the syntax.)

FiloSottile on 2 Oct 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

proposal: sync: Map.Delete method should return bool, indicating if key was deleted or not

lkarlslund · 3Comments

Proposal: supporting “symlinks” in GOPATH

myitcv · 3Comments

x/net/webdav: fails on broken links

enoodle · 3Comments

spec: Feature request: add a ternary expression

michaelsafyan · 3Comments

x/build/cmd/coordinator: ssh proxy should support scp

bradfitz · 3Comments

Go: net/url: URL.Parse Multiple Parsing Issues

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 36 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?