Go: net/url: misleading error message when url has a leading space

Created on 14 Dec 2018  Â·  15Comments  Â·  Source: golang/go

What version of Go are you using (go version)?

$ go version
go version go1.11.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stefanb/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stefanb/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
    "net/http"
)

func main() {
    _, err := http.Get(" http://example.org")
    if err != nil {
        panic(err)
    }
}

https://play.golang.org/p/jkcYSD6ZRcO

What did you expect to see?

Either

  • a generic error message of invalid URL or
  • a detailed error of invalid URL scheme or
  • a detailed error of leading space in URL

What did you see instead?

A misleading detailed error message:

parse  http://example.org: first path segment in URL cannot contain colon

In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.

I propose to either:

  • adjust the error message to eg:
parse  http://example.org: URL scheme cannot contain spaces

or

  • quote the problematic URL in the error message, so that it is more obvious even if the message itself remains misleading:
parse " http://example.org": first path segment in URL cannot contain colon

or

  • ideally both adjust the error message + quote the URL:
parse " http://example.org": URL scheme cannot contain spaces

The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.

Current implementation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L540-L562

NeedsFix

Most helpful comment

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

All 15 comments

@bradfitz @dsnet Wouldnt a TrimSpace before getscheme on this line solve the issue? I can raise a PR for this if thats agreed upon as the fix. Would also write tests of course.

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

That is an incorrect error. It should be alpha character, not alpha-numeric.

On Dec 20, 2018, at 11:27 PM, Agniva De Sarker notifications@github.com wrote:

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

require("url").parse(' https://example.org')
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: 'example.org',
port: null,
hostname: 'example.org',
hash: null,
search: null,
query: null,
pathname: '/',
path: '/',
href: 'https://example.org/' }
Curl does not -

curl ' https://example.org'
curl: (1) Protocol " https" not supported or disabled in libcurl
Same with wget -

wget ' https://example.org'
https://example.org: Scheme missing.
Python also fails to parse-

from urllib.parse import urlparse

o = urlparse(' https://example.org')
o
ParseResult(scheme='', netloc='', path=' https://example.org', params='', query='', fragment='')
Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

That's right, sorry. Then we can simplify it to ".. does not begin with a letter".

simplify it to ".. does not begin with a letter".

Perhaps ".. scheme does not begin with a letter" would be more accurate as "//example.org" is valid to my understanding.

Interestingly, i found

    u, err := url.Parse("    //example.org")
    if err != nil {
        panic(err)
    }
    fmt.Println(u)

is NOT failing, but outputs:

%20%20%20%20//example.org

https://play.golang.org/p/jRk63PWsKG5

Yes, we should return an error on leading bogus chars.

ok I will make a PR for this code change with associated tests.

I have implemented the generic improvement of error messages by consistently quoting the URLs in #29384, but more detailed messages and stricter validation are welcome of course.

@bradfitz what does "bogus chars" translate to besides a space?

Any non-letter, according to the spec.

On further analysis, it seems its not straightforward to implement this because of 2 reasons:

  • the characters that are invalid in a scheme could be valid in other parts of the url (like the period and plus characters) so we will need to remember what we processed before. Currently getscheme does not do that, and am wondering if adding that functionality would increase complexity.
  • the url package says it should be backward compatible.

To account for the space-specific error, we could add a separate case branch and raise the error message like so:

case c == ' ':   // Add other invalid prefix chars here
    if i == 0 {
        return "", "", errors.New("URL scheme has invalid characters")
    } 

However this seems like its a bandaid fix to me, but I am not sure, so would appreciate some input. Any thoughts on whether this is good, or if there's any other approach?

@stefanb @bradfitz @agnivade ?

I assume the proposed change would go just before default line 450 in the switch block here:
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L435-L457

This would make it inconsistent with other validation error cases in the same switch block:

1) the first character validation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L441-L444
2) the invalid character validation
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L450-L453

They both just returns scheme as "" and no error whatsoever. It should be consistent in a way so that all reasons for invalid scheme result either in error or "", but not mixed.

It's hard to discuss code changes here. Maybe just send a CL and we can continue it over there.

We probably need to simplify the switch-case a bit. Have an if condition first, which checks for i==0 and non-letter, and return immediately. And integrate the rest of the switch-case accordingly.

We can keep the special c == ':' and i == 0 check, which will preserve the "missing protocol scheme" error, for backward compatibility reasons. But I will let @bradfitz decide that.

Change https://golang.org/cl/155922 mentions this issue: net/url: give a proper error message on invalid character in scheme

Change https://golang.org/cl/185117 mentions this issue: net/url: improve url parsing error messages by quoting

Was this page helpful?
0 / 5 - 0 ratings