Hello,
Catched URI malformed error in new "got", when I trying to send request to URL with URL-encoded chars.
URL examples:
https://www.kinopoisk.ru/community/city/%D2%E0%EB%EB%E8%ED/
https://www.kinopoisk.ru/news/keyword/%C7%E2%E5%E7%E4%ED%FB%E5+%E2%EE%E9%ED%FB/
nodejs: 9.2.0
got: 8.0.0
Failed code:
const got = require('got');
(async () => {
try {
const response = await got('https://www.kinopoisk.ru/community/city/%D2%E0%EB%EB%E8%ED/');
console.log(response);
} catch (error) {
console.log(error);
}
})();
URIError: URI malformed
at decodeURI (<anonymous>)
at module.exports (/Users/kirill-m/git/test/node_modules/normalize-url/index.js:87:21)
at /Users/kirill-m/git/test/node_modules/cacheable-request/src/index.js:43:16
at get (/Users/kirill-m/git/test/node_modules/got/index.js:98:20)
at Promise.resolve.then.size (/Users/kirill-m/git/test/node_modules/got/index.js:274:5)
at <anonymous>
Broken at this commit: https://github.com/sindresorhus/got/commit/3c7920507fae88a5f53d0640b5116fa34a5ed829
Any ideas?
Explanation of failure:
This website is windows-1252 encoded which is unsupported by js-native decode utilities which operate on and assume UTF-8 input. The encoded portion of the provided URLs contain sequences that are invalid for UTF-8 encoding, and as a result cannot be decoded properly. This error can be reproduced in any browser console or repl properly implementing the spec (e.g. repl.it) which is expecting UTF-8.
Take %D2%E0%EB%EB%E8%ED as an example which represents Òà ëëèà in windows-1252 encoding. The equivalent in UTF-8 would be %C3%92%C3%A0%C3%AB%C3%AB%C3%A8%C3%AD giving a URL of https://www.kinopoisk.ru/community/city/%C3%92%C3%A0%C3%AB%C3%AB%C3%A8%C3%AD/. Unfortunately, this URL won't work due to the encoding of the website.
Why this fails now, but not before:
The introduction of caching via lukechilds/cacheable-request introduced the package sindresorhus/normalize-url which uses decodeURI internally. This module _could_ perform a best-effort decoding - falling back to the encoded value - when the string is not UTF-8 encoded. This would allow URLs that happen to be encoded unexpectedly to process successfully.
I don't think a fix, if any, would be applied here directly in Got.
Thanks for elaborating @brandon93s. I think the correct fix here is to detect the case early in Got and throw a user-friendly error about the URL having an invalid encoding.
@sindresorhus Are we okay with a brute force try...catch around a decodeURI early-ish on in normalizeArguments to catch any potential errors with a user-friendly message? Any decodeURI failure will present an issue, so we might as well check upfront and inform the consumer!
Glad to implement...
Yes