Node: Unreserved characters are escaped when making a HTTP request

Created on 24 Apr 2020  ·  21Comments  ·  Source: nodejs/node

  • Version: >=10.20.1
  • Platform: all
  • Subsystem: http

What steps will reproduce the bug?

const http = require('http');

const url = new URL('http://httpbin.org/anything?a=~');
url.search = url.searchParams.toString(); // trigger normalization

const request = http.get(url, response => {
    const chunks = [];

    response.on('data', chunk => {
        chunks.push(chunk);
    });

    response.once('end', () => {
        console.log(request._header);
        // Actual:
        // GET /anything?a=%7E
        //
        // Expected:
        // GET /anything?a=~
    });
})

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

GET /anything?a=~

What do you see instead?

GET /anything?a=%7E

Additional information

According to tools.ietf.org/html/rfc3986#section-2.3 tilde is unreserved and should not encoded by URI producers.

https://github.com/sindresorhus/got/issues/1180#issuecomment-618831326

known limitation url whatwg-url

Most helpful comment

Quick update on https://github.com/whatwg/url/issues/478 ... I have a general proposed fix on the table (to require that URLQueryParam stringification follow URL rules when .searchParams is used to mutate the URL). The WHATWG folks are working that through their process to determine how other implementers may feel about the change. If it looks like the change will have a good chance of making it through, I will open a PR there with a draft spec change. Once that is accepted, we'll need a PR here to fix and close this issue.

Specifically, the proposed change is internal only, and would only impact using the .searchParams object associated with the URL to mutate the URL.. that is...

This would follow URL encoding rules:

u.searchParams.sort()

But this would not, however...

u.search = u.searchParams.toString()

Assuming the change proposal does not go through at the WHATWG, then this will need to be closed without changing behavior as our priority is on following the spec and we do not want to unilaterally diverge from that. It would, however, be good to add documentation to url.md.

@bnoordhuis

...Right now this issue is completely inactionable

Except that it is not. See above.

All 21 comments

The url.searchParams.toString() call is what changes ~ to %7E. I'm not 100% sure but it looks according to spec to me. Firefox behaves the same way, FWIW.

The spec says the application/x-www-form-urlencoded serializer should be applied:

The stringification behavior must return the serialization of the URLSearchParams object’s list.

@Himself65 I see you tagged this with confirmed-bug. Can you elaborate?

oh, sry for I didn't look up the spec🤦‍.

and just a little confused me because
image

Isn't that bug in URLSearchParams? Because in spec https://tools.ietf.org/html/rfc3986#section-2.3 we see that ~ is unreserved character and

   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

Also in https://url.spec.whatwg.org/ we see that percent encoding must be used only for code points greater than U+007E (~) but not equal this symbol

@Hamper https://url.spec.whatwg.org/#concept-urlencoded-byte-serializer is the mapping for individual bytes. ~ must be percent-encoded according to that.

The URL parser uses the WHATWG URL Standard as the normative reference and not rfc3986 and the two definitely differ with regards to some of the escaping rules.

That said, I do think this is a bug... not necessarily in our impl but in the URL Standard spec itself. Definitely warrants more investigation.

Specifically, look at section https://url.spec.whatwg.org/#url-parsing and search for "query state" ...

image

@bnoordhuis ... the section you reference deals specifically with application/x-www-form-urlencoded serialization, which is different from both URL and URLSearchParams. While the URL Standard does provide a specification for application/x-www-form-urlencoded parsing and serialization, it does not require or even recommend it's use for URL parsing and handling of URLSearchParams. They are separate things. The issue opened here is in regards to URL parsing and not application/x-www-form-urlencoded

@jasnell I think you overlooked the line that says url.search = url.searchParams.toString() - it's the .toString() that encodes to application/x-www-form-urlencoded and that's per spec.

Ah, yep, there it is. As I said, I don't think this is a bug in our implementation but I do think it's a bug in the spec itself because given the rules defined elsewhere in the doc ~ is handled differently. Let's keep this issue open and get clarification from the URL Standard folks on what the correct behavior should be.

/cc @annevk ... when you get a moment, could you help us resolve this issue? :-)

@bnoordhuis but if you do

const url = new URL('http://httpbin.org/anything?a=~');
url.searchParams.sort()
url.toString()

we can see same bug

Yep... there's definitely an inconsistency there...

C:\Users\jasne\Projects\tmp>node
Welcome to Node.js v14.0.0.
Type ".help" for more information.
> const u = new URL('http://example.com/?a=~')
undefined
> u.toString()
'http://example.com/?a=~'
> u.searchParams.sort()
undefined
> u.toString()
'http://example.com/?a=%7E'
>

See https://github.com/whatwg/url/issues/18. URLSearchParams is aligned with forms, .search with URLs. I personally think the discrepancy is okay, but I can see how it's confusing. More importantly, I'm not really sure how to resolve it without a lot of likely breaking changes.

Hmm... even with .search, however, mutations that occur via u.searchParams (such as sort) will cause .search to use the escaped version. I can definitely appreciate maintaining the difference between URLSearchParams and .search but I'm wondering if it would be at least possible to make it such that when URLSearchParams is used to mutate a URL, the URL rules are used rather than the form-encoding rules?

I guess that's something we hadn't considered yet. I'm not sure what the complexity of that would be offhand, but if someone was willing to make that a bit more detailed (probably in a fresh issue on whatwg/url) I'd be happy to help find out if there's interest in aligning on something like that.

Issue opened. Will keep this issue open until that is resolved. Thank you for the clarifications @annevk :-)

Does the WHATWG URL spec override the URL normalization in the HTTP spec?

@szmarczak ... that's a total grey area lol. All of the various HTTP specifications normatively reference the IETF URL specification in some way, and most HTTP server and middle box implementations follow suit. The WHATWG URL spec, however, is what browsers / useragents generally follow and the WHATWG spec has largely emerged as the de facto standard. There's always been a bit of contention on the areas where they differ.

Keep will this issue open until that is resolved.

I think this issue should be closed until the WHATWG issue reaches a conclusion, if indeed it ever does. Right now this issue is completely inactionable.

I think this issue should be closed

I strongly disagree. It's easy to forget the issue that way and we don't want that. A more appropriate thing to do would be rather to mark this issue as blocked.

There are almost a thousand open issues. No one is going to look at this one once it drops off the first page or two, label or no label.

Quick update on https://github.com/whatwg/url/issues/478 ... I have a general proposed fix on the table (to require that URLQueryParam stringification follow URL rules when .searchParams is used to mutate the URL). The WHATWG folks are working that through their process to determine how other implementers may feel about the change. If it looks like the change will have a good chance of making it through, I will open a PR there with a draft spec change. Once that is accepted, we'll need a PR here to fix and close this issue.

Specifically, the proposed change is internal only, and would only impact using the .searchParams object associated with the URL to mutate the URL.. that is...

This would follow URL encoding rules:

u.searchParams.sort()

But this would not, however...

u.search = u.searchParams.toString()

Assuming the change proposal does not go through at the WHATWG, then this will need to be closed without changing behavior as our priority is on following the spec and we do not want to unilaterally diverge from that. It would, however, be good to add documentation to url.md.

@bnoordhuis

...Right now this issue is completely inactionable

Except that it is not. See above.

Alrighty, well, it's looking like changes to the WHATWG spec aren't going to happen, which is fine. Added a PR that documents the discrepancy.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jmichae3 picture jmichae3  ·  3Comments

danielstaleiny picture danielstaleiny  ·  3Comments

filipesilvaa picture filipesilvaa  ·  3Comments

loretoparisi picture loretoparisi  ·  3Comments

cong88 picture cong88  ·  3Comments