Go: net/url: misleading error message when url has a leading space

Created on 14 Dec 2018 · 15Comments · Source: golang/go

What version of Go are you using (`go version`)?

$ go version
go version go1.11.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output



$ go env

GOARCH="amd64"

GOBIN=""

GOCACHE="/Users/stefanb/Library/Caches/go-build"

GOEXE=""

GOFLAGS=""

GOHOSTARCH="amd64"

GOHOSTOS="darwin"

GOOS="darwin"

GOPATH="/Users/stefanb/go"

GOPROXY=""

GORACE=""

GOROOT="/usr/local/Cellar/go/1.11.3/libexec"

GOTMPDIR=""

GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"

GCCGO="gccgo"

CC="clang"

CXX="clang++"

CGO_ENABLED="1"

GOMOD=""

CGO_CFLAGS="-g -O2"

CGO_CPPFLAGS=""

CGO_CXXFLAGS="-g -O2"

CGO_FFLAGS="-g -O2"

CGO_LDFLAGS="-g -O2"

PKG_CONFIG="pkg-config"

GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
    "net/http"
)

func main() {
    _, err := http.Get(" http://example.org")
    if err != nil {
        panic(err)
    }
}

https://play.golang.org/p/jkcYSD6ZRcO

What did you expect to see?

Either

a generic error message of invalid URL or
a detailed error of invalid URL scheme or
a detailed error of leading space in URL

What did you see instead?

A misleading detailed error message:

parse  http://example.org: first path segment in URL cannot contain colon

In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.

I propose to either:

adjust the error message to eg:

parse  http://example.org: URL scheme cannot contain spaces

quote the problematic URL in the error message, so that it is more obvious even if the message itself remains misleading:

parse " http://example.org": first path segment in URL cannot contain colon

ideally both adjust the error message + quote the URL:

parse " http://example.org": URL scheme cannot contain spaces

The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.

Current implementation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L540-L562

NeedsFix

Source

stefanb

Most helpful comment

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

agnivade on 21 Dec 2018

❤3

All 15 comments

@bradfitz @dsnet Wouldnt a TrimSpace before getscheme on this line solve the issue? I can raise a PR for this if thats agreed upon as the fix. Would also write tests of course.

cgopalan on 15 Dec 2018

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

agnivade on 21 Dec 2018

❤3

That is an incorrect error. It should be alpha character, not alpha-numeric.

On Dec 20, 2018, at 11:27 PM, Agniva De Sarker notifications@github.com wrote:

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

require("url").parse(' https://example.org')
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: 'example.org',
port: null,
hostname: 'example.org',
hash: null,
search: null,
query: null,
pathname: '/',
path: '/',
href: 'https://example.org/' }
Curl does not -

curl ' https://example.org'
curl: (1) Protocol " https" not supported or disabled in libcurl
Same with wget -

wget ' https://example.org'
https://example.org: Scheme missing.
Python also fails to parse-

from urllib.parse import urlparse

o = urlparse(' https://example.org')
o
ParseResult(scheme='', netloc='', path=' https://example.org', params='', query='', fragment='')
Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

robaho on 21 Dec 2018

👍1

That's right, sorry. Then we can simplify it to ".. does not begin with a letter".

agnivade on 21 Dec 2018

simplify it to ".. does not begin with a letter".

Perhaps ".. scheme does not begin with a letter" would be more accurate as "//example.org" is valid to my understanding.

Interestingly, i found

    u, err := url.Parse("    //example.org")
    if err != nil {
        panic(err)
    }
    fmt.Println(u)

is NOT failing, but outputs:

%20%20%20%20//example.org

https://play.golang.org/p/jRk63PWsKG5

stefanb on 21 Dec 2018

Yes, we should return an error on leading bogus chars.

bradfitz on 21 Dec 2018

ok I will make a PR for this code change with associated tests.

cgopalan on 21 Dec 2018

I have implemented the generic improvement of error messages by consistently quoting the URLs in #29384, but more detailed messages and stricter validation are welcome of course.

stefanb on 21 Dec 2018

@bradfitz what does "bogus chars" translate to besides a space?

cgopalan on 22 Dec 2018

Any non-letter, according to the spec.

agnivade on 22 Dec 2018

On further analysis, it seems its not straightforward to implement this because of 2 reasons:

the characters that are invalid in a scheme could be valid in other parts of the url (like the period and plus characters) so we will need to remember what we processed before. Currently getscheme does not do that, and am wondering if adding that functionality would increase complexity.
the url package says it should be backward compatible.

To account for the space-specific error, we could add a separate case branch and raise the error message like so:

case c == ' ':   // Add other invalid prefix chars here
    if i == 0 {
        return "", "", errors.New("URL scheme has invalid characters")
    }

However this seems like its a bandaid fix to me, but I am not sure, so would appreciate some input. Any thoughts on whether this is good, or if there's any other approach?

@stefanb @bradfitz @agnivade ?

cgopalan on 23 Dec 2018

I assume the proposed change would go just before default line 450 in the switch block here:
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L435-L457

This would make it inconsistent with other validation error cases in the same switch block:

1) the first character validation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L441-L444
2) the invalid character validation
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L450-L453

They both just returns scheme as "" and no error whatsoever. It should be consistent in a way so that all reasons for invalid scheme result either in error or "", but not mixed.

stefanb on 28 Dec 2018

It's hard to discuss code changes here. Maybe just send a CL and we can continue it over there.

We probably need to simplify the switch-case a bit. Have an if condition first, which checks for i==0 and non-letter, and return immediately. And integrate the rest of the switch-case accordingly.

We can keep the special c == ':' and i == 0 check, which will preserve the "missing protocol scheme" error, for backward compatibility reasons. But I will let @bradfitz decide that.

agnivade on 28 Dec 2018

Change https://golang.org/cl/155922 mentions this issue: net/url: give a proper error message on invalid character in scheme

gopherbot on 26 Feb 2019

Change https://golang.org/cl/185117 mentions this issue: net/url: improve url parsing error messages by quoting

gopherbot on 4 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

encoding/csv: Incorrectly parse records with \r as the record separator

ajstarks · 3Comments

x/tools/cmd/godoc: not work with symbolic links

gopherbot · 3Comments

x/tools/cmd/gorename: gorename fails with on cgo-using files

ashb · 3Comments

proposal: cmd/vet: vet should warn when time.Time type (or types embed it) is used as map keys.

go101 · 3Comments

all: T.FailNow used in goroutines in standard library tests

dominikh · 3Comments

Go: net/url: misleading error message when url has a leading space

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 15 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?