Extra slashes in the URL change the routing. For example, /x and //x do not route the same way.
Here's a unit test which demonstrates this:
https://gist.github.com/robnagler/1a5d0361cc71a6806fc6
If you put a proxy in front of flask, it will normalize the URLs so this isn't an issue. It would seem, however, that Flask should normalize the URLs as well.
Possible duplicate of mitsuhiko/flask#900.
I don't think that this is likely to be a limitation of WSGI.
/x and /x/ are also treated differently as they are different. Before "fixing" this we should check whether this is actually an issue.
I don't think this is a duplicate of #900, which is about embedded slashes, e.g. http://www.bivio.biz/bp/Intro%2f returns not found, because "Intro/" is an unknown page in the system.
This issue is about extra, contiguous slashes. Application frameworks can (and should, imo) ignore extra, unencoded slashes. For example, http://slashdot.org///////////////stories is the same as http://slashdot.org/stories.
Does any RFC suggest/require this kind of normalization? If not, my opinion on this is _garbage in garbage out_
Ignore my previous comment, we actually do mean the same thing.
These kinds of URLs mostly occur when building links by simply concatenating segments, so I would agree that it's a good idea to handle such broken links. However, I don't understand why websites (or those that I've tried) simply serve the same content instead of redirecting to the correct page. This seems like much saner behavior.
Serving the same content might be a significant problem, if Google or other search engines consider such urls to be different.
On most operating systems, extra slashes resolve to one. URLs came from file system notation so it would seem that the two should respond in the same way.
Apache removes duplicate slashes for a good reason.
Nginx has the merge_slashes directive, which is on by default.
RFC 3986 seems to imply you can't have multiple, contiguous slashes:
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
; non-zero-length segment without any colon ":"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
Note that path-* require segment-nz. segment-nz requires at least one pchar, which cannot be a slash.
Apache removes duplicate slashes for a good reason.
Both sides make good arguments in that thread. Note that I'm not opposed to handling multiple slashes as one, but I'm not sure
If people are using Apache or Nginx (and I suspect others) as a proxy, slashes are getting normalizepnd today. This only showed up when I was testing with the built-in HTTP server, which is arguably not the way people will deploy Flask in production.
You could have a backward compatibility flag, but leave it off by default. If people are expecting multiple slashes to be significant, they are likely(hopefully?) have a test for this uncommon behavior. You could raise an exception if routes are defined with multiple slashes. Consider this route:
@app.route('/<first>/<rest>')
What's supposed to happen here if <first> is empty? Flask does not route /<first>/<rest> with //x.
I do not think implicit redirects are a good idea. I do think there could be an option here, which could be on by default.
URLs like '/x/../index.html' do not redirect in Apache or Nginx. (BTW, Flask doesn't handle this correctly, and this is clearly defined behavior in RFC 3986. In this case, most browsers normalize these URLs before sending, but curl, for example, doesn't, and therefore Flask or the underlying Python HTTP server must.)
Why does Flask handle this less correctly than other implementations?
I'm assuming you mean handling of '..'. Here's a gist that demonstrates '/..' doesn't route to '/', which
I believe it should[*]:
https://gist.github.com/robnagler/5bc9399f0e761e75fdd9
The purpose of the frameworks, I believe, is to eliminate these types of issues. From a security point of view, it is better if '..' handling is done by the framework. From a practical point of view, it's extra
unnecessary work for the application programmer.
[*] From RFC 3986:
The path segments "." and "..", also known as dot-segments, are defined for relative reference within
the path name hierarchy. They are intended for use at the beginning of a relative-path reference (Section 4.2) to indicate relative position within the hierarchical tree of names. This is similar to their role within some operating systems' file directory structures to indicate the current directory and parent directory, respectively. However, unlike in a file system, these dot-segments are only interpreted within the URI path hierarchy and are removed as part of the resolution process (Section 5.2).
This is a Werkzeug issue, reported at pallets/werkzeug#1132.
Most helpful comment
On most operating systems, extra slashes resolve to one. URLs came from file system notation so it would seem that the two should respond in the same way.
Apache removes duplicate slashes for a good reason.
Nginx has the merge_slashes directive, which is on by default.
RFC 3986 seems to imply you can't have multiple, contiguous slashes:
Note that
path-* requiresegment-nz. segment-nz requires at least onepchar, which cannot be a slash.