What pain point are you perceiving?.
$ npx marked
[aaa](https://褋谢96.褉褎)
<p><a href="https://%D1%81%D0%BB96.%D1%80%D1%84">aaa</a></p>
require('url').parse('https://%D1%81%D0%BB96.%D1%80%D1%84') populates pathname instead of hostname:
Url {
protocol: 'https:',
slashes: true,
auth: null,
host: '',
port: null,
hostname: '',
hash: null,
search: null,
query: null,
pathname: '%D1%81%D0%BB96.%D1%80%D1%84',
path: '%D1%81%D0%BB96.%D1%80%D1%84',
href: 'https:///%D1%81%D0%BB96.%D1%80%D1%84' }
Describe the solution you'd like
Please, don't percent-encode internationalized domain names, because it doesn't work well with url.parse. I see no reason for IDN to be encoded.
Looks like we would need to extract the domain name from href in Renderer.prototype.link
https://github.com/markedjs/marked/blob/ac4f2e46ee75b8bc23a336d0e79fe8fa65f55476/lib/marked.js#L1015
A PR would be welcome
FYI, it looks like CommonMark has the same behavior as marked.
@UziTech How would you safely extract the domain name? Maybe something like this?
function getHostName() {
var url = new URL(href);
return url.hostname
}
But this relies on URL which is not available in IE.
Perhaps https://nodejs.org/api/url.html#url_url_format_url_options could be of help.... although use all true values since we would want those available. I'm not entirely sure why this project is encoding without diving into the historical commits.
Also https://nodejs.org/api/url.html#url_url_resolve_from_to presuming these are available... always test assertions. :)
@Martii marked is used on the client side as well and in Node so Node APIs won't work.
@styfle We could just use regex
function getHostName(href) {
return href.match(/^https?:\/\/[^\/]+\//)[0];
}
Why do you need to encode and then to escape:
href = encodeURI(href).replace(/%25/g, '%');
var out = '<a href="' + escape(href) + '"';
What's the purpose? Can't you just do this instead:
href = href.replace(/"/g, '%22')
.replace(/\r/g, '%0D')
.replace(/\n/g, '%0A');
var out = '<a href="' + href + '"';
Or maybe just encodeURI is enough without escape?
Also it may be not a marked fault, but a fault of url module of Node.js, because Chrome and new URL(href) both handle percent-encoded IDNs just fine.
After a little contemplation I decided I'm fine with the current behavior as long as it works in major browsers and handled by search engines. If you agree and don't see any problem with url Node.js module then you may close it.
If I'm not mistaken there exists this at https://www.npmjs.com/package/url (haven't tried it in the DOM yet... not much time to do this at the moment). It's a bit tenured just like the node URL API.
Would be nice if ECMAScript standards would incorporate some sort of improved IRI handling like this.
Would be nice if ECMAScript standards would incorporate some sort of improved IRI handling like this.
@Martii There is the WHATWG URL standard which is implemented in modern browsers and newer versions of Node.js which I believe has improved IRI handling. See my comment above.
After a little contemplation I decided I'm fine with the current behavior as long as it works in major browsers and handled by search engines. If you agree and don't see any problem with url Node.js module then you may close it.
@ilyaigpetrov Thanks, I'll close this issue.