go version)?$ go version go version go1.11.3 darwin/amd64
Yes
go env)?go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stefanb/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stefanb/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"
package main
import (
"net/http"
)
func main() {
_, err := http.Get(" http://example.org")
if err != nil {
panic(err)
}
}
https://play.golang.org/p/jkcYSD6ZRcO
Either
A misleading detailed error message:
parse http://example.org: first path segment in URL cannot contain colon
In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.
I propose to either:
parse http://example.org: URL scheme cannot contain spaces
or
parse " http://example.org": first path segment in URL cannot contain colon
or
parse " http://example.org": URL scheme cannot contain spaces
The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.
Current implementation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L540-L562
@bradfitz @dsnet Wouldnt a TrimSpace before getscheme on this line solve the issue? I can raise a PR for this if thats agreed upon as the fix. Would also write tests of course.
Technically, according to RFC3986-Sec3.1 -
Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").
So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.
NodeJS accepts it-
> require("url").parse(' https://example.org')
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: 'example.org',
port: null,
hostname: 'example.org',
hash: null,
search: null,
query: null,
pathname: '/',
path: '/',
href: 'https://example.org/' }
Curl does not -
curl ' https://example.org'
curl: (1) Protocol " https" not supported or disabled in libcurl
Same with wget -
wget ' https://example.org'
https://example.org: Scheme missing.
Python also fails to parse-
from urllib.parse import urlparse
>>> o = urlparse(' https://example.org')
>>> o
ParseResult(scheme='', netloc='', path=' https://example.org', params='', query='', fragment='')
Note that scheme and netloc are empty.
In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -
parse " http://example.org": URL scheme does not begin with an alpha-numeric character.
That is an incorrect error. It should be alpha character, not alpha-numeric.
On Dec 20, 2018, at 11:27 PM, Agniva De Sarker notifications@github.com wrote:
Technically, according to RFC3986-Sec3.1 -
Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.
NodeJS accepts it-
require("url").parse(' https://example.org')
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: 'example.org',
port: null,
hostname: 'example.org',
hash: null,
search: null,
query: null,
pathname: '/',
path: '/',
href: 'https://example.org/' }
Curl does not -curl ' https://example.org'
curl: (1) Protocol " https" not supported or disabled in libcurl
Same with wget -wget ' https://example.org'
https://example.org: Scheme missing.
Python also fails to parse-from urllib.parse import urlparse
o = urlparse(' https://example.org')
o
ParseResult(scheme='', netloc='', path=' https://example.org', params='', query='', fragment='')
Note that scheme and netloc are empty.In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -
parse " http://example.org": URL scheme does not begin with an alpha-numeric character.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
That's right, sorry. Then we can simplify it to ".. does not begin with a letter".
simplify it to ".. does not begin with a letter".
Perhaps ".. scheme does not begin with a letter" would be more accurate as "//example.org" is valid to my understanding.
Interestingly, i found
u, err := url.Parse(" //example.org")
if err != nil {
panic(err)
}
fmt.Println(u)
is NOT failing, but outputs:
%20%20%20%20//example.org
Yes, we should return an error on leading bogus chars.
ok I will make a PR for this code change with associated tests.
I have implemented the generic improvement of error messages by consistently quoting the URLs in #29384, but more detailed messages and stricter validation are welcome of course.
@bradfitz what does "bogus chars" translate to besides a space?
Any non-letter, according to the spec.
On further analysis, it seems its not straightforward to implement this because of 2 reasons:
getscheme does not do that, and am wondering if adding that functionality would increase complexity.To account for the space-specific error, we could add a separate case branch and raise the error message like so:
case c == ' ': // Add other invalid prefix chars here
if i == 0 {
return "", "", errors.New("URL scheme has invalid characters")
}
However this seems like its a bandaid fix to me, but I am not sure, so would appreciate some input. Any thoughts on whether this is good, or if there's any other approach?
@stefanb @bradfitz @agnivade ?
I assume the proposed change would go just before default line 450 in the switch block here:
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L435-L457
This would make it inconsistent with other validation error cases in the same switch block:
1) the first character validation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L441-L444
2) the invalid character validation
https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L450-L453
They both just returns scheme as "" and no error whatsoever. It should be consistent in a way so that all reasons for invalid scheme result either in error or "", but not mixed.
It's hard to discuss code changes here. Maybe just send a CL and we can continue it over there.
We probably need to simplify the switch-case a bit. Have an if condition first, which checks for i==0 and non-letter, and return immediately. And integrate the rest of the switch-case accordingly.
We can keep the special c == ':' and i == 0 check, which will preserve the "missing protocol scheme" error, for backward compatibility reasons. But I will let @bradfitz decide that.
Change https://golang.org/cl/155922 mentions this issue: net/url: give a proper error message on invalid character in scheme
Change https://golang.org/cl/185117 mentions this issue: net/url: improve url parsing error messages by quoting
Most helpful comment
Technically, according to RFC3986-Sec3.1 -
So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.
NodeJS accepts it-
Curl does not -
Same with wget -
Python also fails to parse-
Note that
schemeandnetlocare empty.In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -
parse " http://example.org": URL scheme does not begin with an alpha-numeric character.