Rocket: Curly braces in URL causes URI parser to fail

Created on 15 Feb 2019 · 15Comments · Source: SergioBenitez/Rocket

Rocket 0.4.0.
Mac OS X 10.11 / Ubuntu Server 18

When the browser opens URL like http://somedomain.com/somepath/somealias?queryparam={curlybraces} on my Rocket based webserver, I get 400 Bad Request and panicked at 'Error: expected EOF but found '{' at index 12' in it's log.
While expecting behavior is to trait {curlybraces} as a value for queryparam.

It's obviously an issue related to URI parser. But wait! An implementation of URI parsing is quiet correct and done right accordingly to RFC!

Let's look at RFC 7230, Section 2.7 and RFC 3986, Section 3.4 which describes the Query part of URI.

According to RFCs above, query string should contain either pchar or "/" or "?".
pchar defined as follows:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded   = "%" HEXDIG HEXDIG
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

RFCs have no place for '{' and '}' symbols!

The common behavior is to trait them as correct ones, as implemented in modern browsers and webservers. (Try yourself pointing your browser to https://github.com/SergioBenitez/Rocket/?name={name} - you will get 404 error, not Bad Request! This means that the parser of URL on the Github's side eats {} well.)

I have a dirty patch to this issue here: https://github.com/vladignatyev/Rocket/commit/b4660b71f90ec7580d2046fcdf18c11be7169430
(simply added curly braces '{' and '}' into char table, so parser does not fail).

The test in the provided commit is not correct, but it showcases an issue. Before patching it will show the annoying expected EOF but found '{' at index... message. After patching it will fail with assert (left != right).

I need your thoughts and ideas! Thanks.

deficiency

Source

vladignatyev

👍2

Most helpful comment

according to spec, we cannot tell explicitly if such symbols are part of legal URI or not.

The relevant definitions from RFC 3986:

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

I interpret these as specifying that {} are not allowed characters in a query.

According to spec, only characters that may represent delimiters of URI parts should be percent-encoded. So, if the octet is not a reserved character (as of https://tools.ietf.org/html/rfc3986#section-2.2) it should be used as is, or it _may_ (but not _must_) appear percent encoded.

Section 2.1 additionally states:

A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component.

Combined with the above, my interpretation is that { and } are "outside the allowed set" in query context and should be percent encoded.

An issue in Firefox from 4 years ago explored this and found that most browsers don't percent-encode { in the query. The decision was to leave them un-encoded because "applications might have depended on that fact, therefore we better not change it."

I think it will be necessary for Rocket to accept characters based on what user agents do and don't encode regardless of any spec, but this should be well documented in the code. If you have examples of potentially problematic URIs in mind, that would be a great starting point.

I believe it could help to resolve this and similar issues (like #842) without breaking the core concepts of Rocket.

842 needs more than just character set changes - the `Uri` type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.

jebrosen on 17 Feb 2019

🎉1 👍1

All 15 comments

A few pieces of information to help narrow things down:

Is the browser actually sending a { or a %7B? As far as I can tell Firefox actually sends %7B when a { is typed in the address bar.

Is anything 'between' the browser and the Rocket server, such as Apache or nginx? I've hastily tried to reproduce this and found that direct requests are fine, but requests to a server behind nginx fail.

EDIT: I came across https://serverfault.com/questions/459369/disabling-url-decoding-in-nginx-proxy and removing the trailing / in my proxy_pass directive "fixed" the problem in my reproduction case.

jebrosen on 16 Feb 2019

@jebrosen at first, in my set up 'nothing' in between, just browser and rocket server.

I've created an isolated example that shows an issue: https://github.com/vladignatyev/rocket-issue-924

Just clone and cargo run. Then point your browser or other client to http://localhost:8000/somealias?{}

It will fail with 400 error, while expected to respond with Status::Ok 200.

vladignatyev on 16 Feb 2019

As I mentioned above, I have a fix, but my understanding of URI parser may be incomplete, so the fix probably is incorrect (but work for this explicit case).

I tried Firefox 61.0.1, Chrome 71.0.3578.98 on Mac OS X 10.11.6. They do not switch brace into percent encoding.

wget changes a brace into percent encoding %7B.
While curl fails on parsing an URI (curl: (3) [globbing] empty string within braces in column 34).

vladignatyev on 16 Feb 2019

Ah. My reproduction case was examining curly braces in the path, not in the query. It looks like Firefox and Chromium only encode braces in the path portion, and wget encodes them anywhere.

Curl's documentation at https://ec.haxx.se/cmdline-globbing.html says this, emphasis mine:

The globbing uses the reserved symbols [] and {} for this, _symbols that normally cannot be part of a legal URL_

I feel like Rocket (sadly) may have to accept {} in query strings to be compatible with browsers, even if it's not correct.

Aside from this specific case, I'm starting to think that the "raw" URI should at least be accessible in some form even when it's not normalized or encoded properly given how widely implementations differ.

jebrosen on 16 Feb 2019

Why you consider that "{}" in query strings is not correct?.. curl's documentation is not an ultimate truth.

@jebrosen according to spec, we cannot tell explicitly if such symbols are part of legal URI or not.
Furthemore, the RFC doesn't limit us hardly to a set of characters. According to spec, only characters that may represent delimiters of URI parts should be percent-encoded. So, if the octet is not a reserved character (as of https://tools.ietf.org/html/rfc3986#section-2.2) it should be used as is, or it _may_ (but not _must_) appear percent encoded.

Anyway as a user of multiple web frameworks in the past I expect that Rocket will trait {} or [] or even 😋 as a normal characters of the _Query part_ of an URL.

Access to "Raw" URI is a backdoor antipattern and may motivate users of Rocket library to write error prone code, or even open their own software, built on top of Rocket, to a number of stupid *-stuffing vulnerabilities.

@jebrosen May I prepare and share some samples of URI here? I'm going to create a comprehensive test suite before proposing fixes to URL parser.

I believe it could help to resolve this and similar issues (like #842) without breaking the core concepts of Rocket.

vladignatyev on 17 Feb 2019

according to spec, we cannot tell explicitly if such symbols are part of legal URI or not.

The relevant definitions from RFC 3986:

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

I interpret these as specifying that {} are not allowed characters in a query.

According to spec, only characters that may represent delimiters of URI parts should be percent-encoded. So, if the octet is not a reserved character (as of https://tools.ietf.org/html/rfc3986#section-2.2) it should be used as is, or it _may_ (but not _must_) appear percent encoded.

Section 2.1 additionally states:

A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component.

Combined with the above, my interpretation is that { and } are "outside the allowed set" in query context and should be percent encoded.

I believe it could help to resolve this and similar issues (like #842) without breaking the core concepts of Rocket.

842 needs more than just character set changes - the `Uri` type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.

jebrosen on 17 Feb 2019

🎉1 👍1

I'm in the process of making a list of characters remaining unencoded by Firefox/Chrome browsers.
I plan to fix the issue in the following way.
1) Prepare a character set - the extended version of pchar in the original parser / spec
2) Introduce an additional function param to path_and_query in parser, like is_good_char but is_good_query_char and use it here: https://github.com/SergioBenitez/Rocket/blob/56c6a96f6a94ab844a0f571c303d741c7ec668d3/core/http/src/parse/uri/parser.rs#L49
Will do test first.

I spent a day trying to google out similar tests in Chromium or Mozilla repos, but have no success. Do they exist? I sit behind sniffer and logging proxy to find out what the hell browsers send :)

I hope will show my solution at the end of week. Current progress: https://github.com/vladignatyev/Rocket/commit/d0fa06dc0fddc868262c1cdcb055822362cba8ec

vladignatyev on 19 Feb 2019

I spent a day trying to google out similar tests in Chromium or Mozilla repos, but have no success. Do they exist? I sit behind sniffer and logging proxy to find out what the hell browsers send :)

My go-tos for this would be Rocket or another server -- it should be easy to patch rocket to print incoming URIs before attempting to parse them -- or something lower level like Wireshark monitoring the loopback interface.

There is a semi-related issue in #880/#882. I don't think work conflicts at all, but Rocket should come up with some consistent rules on parsing, normalization, and matching of URIs and this falls under that as well.

jebrosen on 23 Feb 2019

It seems that I have a good solution, but have some problems with setting up CI. A couple of tests are failing, no matter if I integrated my solution or not.
Will update as soon as I set CI up.

vladignatyev on 28 Feb 2019

Can anyone explain what is purpose of the following tests?

failures:
    [ui] ui-fail/catch.rs
    [ui] ui-fail/from_form.rs
    [ui] ui-fail/from_form_value.rs
    [ui] ui-fail/uri_display.rs

vladignatyev on 28 Feb 2019

CI shouldn't be necessary right away -- scripts/test.sh will run the tests for everything.

The ui tests are similar to those found in rustc itself: each ui test describes the expected console output from rustc when a compile error is emitted. Changing parsing rules probably made tests pass or fail differently, so those tests will need to be updated as well. IIRC there is a message that explains how to fix them; if there isn't I can track the commands down and get back to you.

jebrosen on 28 Feb 2019

@jebrosen just created pull request https://github.com/SergioBenitez/Rocket/pull/941
Can you assist somehow with tests and check my commits or suggest how to improve them?

vladignatyev on 6 Mar 2019

It seems that we should have two parsers: one for URIs definitions for library users, and another for URLs handling code.

vladignatyev on 6 Mar 2019

I probably won't have time for a real review or testing until the weekend, but I'll leave a few quick comments on it. There are a few things I want to take a close look at and document clearly and succinctly somewhere, even just as a comment on a GitHub issue or PR:

Terminology of URI vs URL, and related terms like origin, authority, etc. I believe that at least some working groups or standards bodies have suggested to only use either URI or URL now.
Which standards are currently intended to be followed by browsers and servers (according to standards bodies, what browsers have pledged to follow, etc.)
Exactly which characters are and aren't in which character categories, which standards do or don't forbid/allow characters, and what browsers actually encode.
- What browsers actually encode is arguably the most important, because no user will understand why a search form can't have a { character typed into it when they use their favorite browser.
- Track down other cases like { in firefox, where deviation from a standard has been intentional or documented
When and why it might matter if URLs are percent-decoded or normalized by Rocket "permanently" versus being only compared in a percent-encoding-agnostic way.

jebrosen on 7 Mar 2019

👍1

Let's work on this in #998.

SergioBenitez on 17 May 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Expose internal template system for improved usability

marcusball · 3Comments

Is there any way to interact with javascript code?

lucklove · 4Comments

Cannot use Tera's template inheritance

paulvt · 4Comments

Update published ring dependency to 0.12

Perseus101 · 4Comments

Configuration: How about having a fallback default configuration instead of global overrides?

Qqwy · 3Comments

Rocket: Curly braces in URL causes URI parser to fail

Most helpful comment

842 needs more than just character set changes - the Uri type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.

All 15 comments

842 needs more than just character set changes - the Uri type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.

Related issues

842 needs more than just character set changes - the `Uri` type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.

842 needs more than just character set changes - the `Uri` type currently doesn't have any capacity to hold a fragment, which should only appear in URIs being prepared to send to a client.