I need to perform requests against an external API that has a custom encoding scheme in the query. More specifically, it requires that the caller put an unencoded URL at the very end of the query string.
While this is perfectly valid as a URL, got seems to assume that query strings are formatted as key1=value1&key2=value2 etc and is first parsing and then re-serializing the query with that assumption in mind. See:
https://github.com/sindresorhus/got/blob/1c54a03bc6a809b73970f1694d3fccd22b664997/source/normalize-arguments.js#L113
However, this means that you no longer support custom query schemes which are perfectly valid URLs? Note that the URL spec does not require a particular query encoding scheme as long as you comply with the allowed character sets.
You can use the beforeRequest hook to modify the path option that will be sent in the request.
@pietermees Can you include an example of the URL you're trying to use? Doesn't have to be the exact same URL as long as it uses the same formatting.
@jstewmon thanks for the suggestion, we can definitely do that as a workaround, but still it's a bit strange that the library insists on rewriting a URL we're passing in as a string. Why can't our requested URL be fired as provided? I can't imagine any need for the library to assume the path has a standard format and would need to inspect anything in it?
@sindresorhus sure thing! Here's an example:
https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra
@pietermees I agree. This is a bug. We should not touch the query in the url argument unless you specify the query option.
@sindresorhus, maybe there should be a normalizeUrl option for this. Changing the behavior could break lots of apps that expect got to ensure the request is sent with a valid URI.
But it’s not an invalid URL. new URL parses it fine.
Ok, I was thinking that reserved characters were required to be percent-encoded, but they are not when their meaning is not significant in the component in which they appear. So, in the case of the query string, :/? are not significant. However, if an & appeared in that uri-with-query parameter value, it would be interpreted as a query delimiter.
Got's handling of this case changed in b8a086f8da53d57fdf11ba49eefa01d778a4f69f - urlSearchParams.toString() appears to be encoding more characters than it should ([source])(https://github.com/nodejs/node/blob/8b4af64f50c5e41ce0155716f294c24ccdecad03/lib/internal/url.js#L826). :/ are not in the ranges specified for encoding by WHATWG's query spec.
So, the unexpected behavior seems to be the result of a bug in node.
Still, we can easily fix the issue in got by simply fixing the normalization routing to not round-trip query through URLSearchParams when it is a string.
Edit: The portion of the WHATWG spec I referenced above was for parsing, not serializing.
The serialization spec requires a much larger character set to be encoded.
Indeed, components further down the URL are less restrictive in terms of allowed characters, so once you've passed the first ? for example, you may use it again without issue.
You're right that an & would cause the final URL to be split in the search params, which is not exactly the semantic meaning of the API:
new URL('https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz')
returns:
URL {
href:
'https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz',
origin: 'https://www.someserver.com',
protocol: 'https:',
username: '',
password: '',
host: 'www.someserver.com',
hostname: 'www.someserver.com',
port: '',
pathname: '/my/path',
search:
'?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz',
searchParams:
URLSearchParams {
'query' => 'abc',
'test' => 'def',
'final' => 'https://www.somethingelse.com/?extra=extra',
'bla' => 'xyz' },
hash: '' }
While
new URL('https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz').toString()
correctly serializes the URL back to
https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz
However
new URL('https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz').searchParams.toString()
results in
query=abc&test=def&final=https%3A%2F%2Fwww.somethingelse.com%2F%3Fextra%3Dextra&bla=xyz
So got's logic results in:
const url = new URL('https://www.someserver.com/my/path?query=abc&test=def&final=https://www.somethingelse.com/?extra=extra&bla=xyz')
url.search = url.searchParams.toString()
which will return the following when you do url.toString():
https://www.someserver.com/my/path?query=abc&test=def&final=https%3A%2F%2Fwww.somethingelse.com%2F%3Fextra%3Dextra&bla=xyz
Conclusion:
I'm not sure what the reason was for parsing the search string and then re-serializing it, but I don't think this is necessary.
got to actually read the parsed search params. The faulty assumption here is that a URL has to be formatted this way, it does not.got should re-serialize the path string like it does now. This process causes URLs to change in unintended ways and it's not clear what the benefit is of doing so.I'll leave this open because the bug isn't fixed yet:
query: {
a: '123?456'
}
@jstewmon Could you make a node issue about urlSearchParams.toString() appears to be encoding more characters than it should?
I erroneously reference the parsing spec, not the serialization spec in that comment. I've amended the comment to clarify. I don't see any problem with the serialization implementation.
The case you gave illustrates the difference between what must be encoded vs what may be encoded for the URI to be valid. I think the WHATWG spec encodes all chars that may be encoded.
Right. So these sentences:
urlSearchParams.toString() appears to be encoding more characters than it should
So, the unexpected behavior seems to be the result of a bug in node.
aren't true, because there's no bug in Node and there was a bug in Got. Please correct me if I'm wrong :)
Usually users want application/x-www-form-urlencoded. Anyway, as you've pointed out, a=123?456 is still a valid query (but it's not application/x-www-form-urlencoded serialized).
Most helpful comment
Indeed, components further down the URL are less restrictive in terms of allowed characters, so once you've passed the first
?for example, you may use it again without issue.You're right that an
&would cause the final URL to be split in the search params, which is not exactly the semantic meaning of the API:returns:
While
correctly serializes the URL back to
However
results in
So
got's logic results in:which will return the following when you do
url.toString():Conclusion:
I'm not sure what the reason was for parsing the search string and then re-serializing it, but I don't think this is necessary.
gotto actually read the parsed search params. The faulty assumption here is that a URL has to be formatted this way, it does not.gotshould re-serialize the path string like it does now. This process causes URLs to change in unintended ways and it's not clear what the benefit is of doing so.