When the browser opens URL like http://somedomain.com/somepath/somealias?queryparam={curlybraces} on my Rocket based webserver, I get 400 Bad Request and panicked at 'Error: expected EOF but found '{' at index 12' in it's log.
While expecting behavior is to trait {curlybraces} as a value for queryparam.
It's obviously an issue related to URI parser. But wait! An implementation of URI parsing is quiet correct and done right accordingly to RFC!
Let's look at RFC 7230, Section 2.7 and RFC 3986, Section 3.4 which describes the Query part of URI.
According to RFCs above, query string should contain either pchar or "/" or "?".
pchar defined as follows:
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
RFCs have no place for '{' and '}' symbols!
The common behavior is to trait them as correct ones, as implemented in modern browsers and webservers. (Try yourself pointing your browser to https://github.com/SergioBenitez/Rocket/?name={name} - you will get 404 error, not Bad Request! This means that the parser of URL on the Github's side eats {} well.)
I have a dirty patch to this issue here: https://github.com/vladignatyev/Rocket/commit/b4660b71f90ec7580d2046fcdf18c11be7169430
(simply added curly braces '{' and '}' into char table, so parser does not fail).
The test in the provided commit is not correct, but it showcases an issue. Before patching it will show the annoying expected EOF but found '{' at index... message. After patching it will fail with assert (left != right).
I need your thoughts and ideas! Thanks.
A few pieces of information to help narrow things down:
Is the browser actually sending a { or a %7B? As far as I can tell Firefox actually sends %7B when a { is typed in the address bar.
Is anything 'between' the browser and the Rocket server, such as Apache or nginx? I've hastily tried to reproduce this and found that direct requests are fine, but requests to a server behind nginx fail.
EDIT: I came across https://serverfault.com/questions/459369/disabling-url-decoding-in-nginx-proxy and removing the trailing / in my proxy_pass directive "fixed" the problem in my reproduction case.
@jebrosen at first, in my set up 'nothing' in between, just browser and rocket server.
I've created an isolated example that shows an issue: https://github.com/vladignatyev/rocket-issue-924
Just clone and cargo run. Then point your browser or other client to http://localhost:8000/somealias?{}
It will fail with 400 error, while expected to respond with Status::Ok 200.
As I mentioned above, I have a fix, but my understanding of URI parser may be incomplete, so the fix probably is incorrect (but work for this explicit case).
I tried Firefox 61.0.1, Chrome 71.0.3578.98 on Mac OS X 10.11.6. They do not switch brace into percent encoding.
wget changes a brace into percent encoding %7B.
While curl fails on parsing an URI (curl: (3) [globbing] empty string within braces in column 34).
Ah. My reproduction case was examining curly braces in the path, not in the query. It looks like Firefox and Chromium only encode braces in the path portion, and wget encodes them anywhere.
Curl's documentation at https://ec.haxx.se/cmdline-globbing.html says this, emphasis mine:
The globbing uses the reserved symbols [] and {} for this, _symbols that normally cannot be part of a legal URL_
I feel like Rocket (sadly) may have to accept {} in query strings to be compatible with browsers, even if it's not correct.
Aside from this specific case, I'm starting to think that the "raw" URI should at least be accessible in some form even when it's not normalized or encoded properly given how widely implementations differ.
Why you consider that "{}" in query strings is not correct?.. curl's documentation is not an ultimate truth.
@jebrosen according to spec, we cannot tell explicitly if such symbols are part of legal URI or not.
Furthemore, the RFC doesn't limit us hardly to a set of characters. According to spec, only characters that may represent delimiters of URI parts should be percent-encoded. So, if the octet is not a reserved character (as of https://tools.ietf.org/html/rfc3986#section-2.2) it should be used as is, or it _may_ (but not _must_) appear percent encoded.
Anyway as a user of multiple web frameworks in the past I expect that Rocket will trait {} or [] or even 馃構 as a normal characters of the _Query part_ of an URL.
Access to "Raw" URI is a backdoor antipattern and may motivate users of Rocket library to write error prone code, or even open their own software, built on top of Rocket, to a number of stupid *-stuffing vulnerabilities.
@jebrosen May I prepare and share some samples of URI here? I'm going to create a comprehensive test suite before proposing fixes to URL parser.
I believe it could help to resolve this and similar issues (like #842) without breaking the core concepts of Rocket.
according to spec, we cannot tell explicitly if such symbols are part of legal URI or not.
The relevant definitions from RFC 3986:
query = *( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
I interpret these as specifying that {} are not allowed characters in a query.
According to spec, only characters that may represent delimiters of URI parts should be percent-encoded. So, if the octet is not a reserved character (as of https://tools.ietf.org/html/rfc3986#section-2.2) it should be used as is, or it _may_ (but not _must_) appear percent encoded.
Section 2.1 additionally states:
A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component.
Combined with the above, my interpretation is that { and } are "outside the allowed set" in query context and should be percent encoded.
An issue in Firefox from 4 years ago explored this and found that most browsers don't percent-encode { in the query. The decision was to leave them un-encoded because "applications might have depended on that fact, therefore we better not change it."
I think it will be necessary for Rocket to accept characters based on what user agents do and don't encode regardless of any spec, but this should be well documented in the code. If you have examples of potentially problematic URIs in mind, that would be a great starting point.
I believe it could help to resolve this and similar issues (like #842) without breaking the core concepts of Rocket.
Uri type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.I'm in the process of making a list of characters remaining unencoded by Firefox/Chrome browsers.
I plan to fix the issue in the following way.
1) Prepare a character set - the extended version of pchar in the original parser / spec
2) Introduce an additional function param to path_and_query in parser, like is_good_char but is_good_query_char and use it here: https://github.com/SergioBenitez/Rocket/blob/56c6a96f6a94ab844a0f571c303d741c7ec668d3/core/http/src/parse/uri/parser.rs#L49
Will do test first.
I spent a day trying to google out similar tests in Chromium or Mozilla repos, but have no success. Do they exist? I sit behind sniffer and logging proxy to find out what the hell browsers send :)
I hope will show my solution at the end of week. Current progress: https://github.com/vladignatyev/Rocket/commit/d0fa06dc0fddc868262c1cdcb055822362cba8ec
I spent a day trying to google out similar tests in Chromium or Mozilla repos, but have no success. Do they exist? I sit behind sniffer and logging proxy to find out what the hell browsers send :)
My go-tos for this would be Rocket or another server -- it should be easy to patch rocket to print incoming URIs before attempting to parse them -- or something lower level like Wireshark monitoring the loopback interface.
There is a semi-related issue in #880/#882. I don't think work conflicts at all, but Rocket should come up with some consistent rules on parsing, normalization, and matching of URIs and this falls under that as well.
It seems that I have a good solution, but have some problems with setting up CI. A couple of tests are failing, no matter if I integrated my solution or not.
Will update as soon as I set CI up.
Can anyone explain what is purpose of the following tests?
failures:
[ui] ui-fail/catch.rs
[ui] ui-fail/from_form.rs
[ui] ui-fail/from_form_value.rs
[ui] ui-fail/uri_display.rs
CI shouldn't be necessary right away -- scripts/test.sh will run the tests for everything.
The ui tests are similar to those found in rustc itself: each ui test describes the expected console output from rustc when a compile error is emitted. Changing parsing rules probably made tests pass or fail differently, so those tests will need to be updated as well. IIRC there is a message that explains how to fix them; if there isn't I can track the commands down and get back to you.
@jebrosen just created pull request https://github.com/SergioBenitez/Rocket/pull/941
Can you assist somehow with tests and check my commits or suggest how to improve them?
It seems that we should have two parsers: one for URIs definitions for library users, and another for URLs handling code.
I probably won't have time for a real review or testing until the weekend, but I'll leave a few quick comments on it. There are a few things I want to take a close look at and document clearly and succinctly somewhere, even just as a comment on a GitHub issue or PR:
{ character typed into it when they use their favorite browser.{ in firefox, where deviation from a standard has been intentional or documentedLet's work on this in #998.
Most helpful comment
The relevant definitions from RFC 3986:
I interpret these as specifying that
{}are not allowed characters in a query.Section 2.1 additionally states:
Combined with the above, my interpretation is that
{and}are "outside the allowed set" in query context and should be percent encoded.An issue in Firefox from 4 years ago explored this and found that most browsers don't percent-encode
{in the query. The decision was to leave them un-encoded because "applications might have depended on that fact, therefore we better not change it."I think it will be necessary for Rocket to accept characters based on what user agents do and don't encode regardless of any spec, but this should be well documented in the code. If you have examples of potentially problematic URIs in mind, that would be a great starting point.
842 needs more than just character set changes - the
Uritype currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.