Refined-github: url regex is replacing & with &; breaking urls

Created on 13 Nov 2017 · 6Comments · Source: sindresorhus/refined-github

Similar to #803
I haven't looked at the code, but see this url in the review section:
https://github.com/saglacio/saglac.io/pull/143/files/e92bf806d0a0757da2b16623172fa3041e65db36#diff-aeb42283af8ef8e9da40ededd3ae2ab2R23

And the same url not being transformed to a js-linkable-line-number in the conversation:
https://github.com/saglacio/saglac.io/pull/143#discussion-diff-150555564L22

Original (screenshot):

Bug (screenshot, notice the missing amp at the end of the link):

Live test:

url: https://saglac.us17.list-manage.com/subscribe/post?u=c9f3c71f1370a7e550caa409b&amp;id=f45e751233

bug help wanted

Source

GabLeRoux

👍1

Most helpful comment

For a moment I thought this was an enhancement, but it really is a bug. Not only the URL is linkified incorrectly (oh well...), but the code is also changed (which should never happen, amp should not be missing)

This bug can be fixed here: https://github.com/sindresorhus/linkify-urls/issues/14

fregante on 16 Jan 2018

👍2

All 6 comments

This is gonna be interesting.

I don't think that generally ; is an accepted character (non-url-encoded) in URLs.
In this case the URL isn't even technically valid since it's escaped HTML so even if detected correctly it won't work unless we also unescape it.
We don't know if a URL is escaped and what kind of escaping is applied to it. Perhaps & is common enough to grant its own simple detection+unescaping though.

fregante on 14 Nov 2017

@bfred-it I can confirm this point:

I don't think that generally ; is an accepted character (non-url-encoded) in URLs.

; is not an accepted character in URLs without being escaped. We've checked RFCs in order to use it in some CSV imports containing multiple URLs with a separator. We are using ; as a separator instead of just comma , for example.

hkdobrev on 14 Nov 2017

Not as easy as it seems, this is the closest I got:

- const linkified = fn(textNode.textContent, options);
+ const linkified = fn(unescape(textNode.textContent), Object.assign({
+   value: escape(textNode.textContent) 
+   // wrong, it should just be the URL, not the whole string. Related: https://github.com/sindresorhus/linkify-urls/issues/10
+ }, options));

It should unescape the URL so it can be matched by linkify-urls
It should preserve the original unescaped URL as was because this is code.

I can't think of a way to both unescape the URL so it can be parsed by linkify-urls and also preserve the original URL (how do you match it out of textNode.textContent?)

In short, to solve this problem:

linkify-urls should either accept a custom URL regex, or 😟
it should detect escaped URLs internally, or 😳
we can't use linkify-url. 😟

@sindresorhus?

fregante on 8 Dec 2017

Why are we still discussing this if we all agree it's an invalid URL?

sindresorhus on 8 Dec 2017

Perhaps & is common enough to grant its own simple detection+unescaping though.

It's not a valid URL until you unescape it. It's common to find them in HTML documents.

fregante on 8 Dec 2017

This bug can be fixed here: https://github.com/sindresorhus/linkify-urls/issues/14

fregante on 16 Jan 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Fix button for diffs without whitespace

hkdobrev · 3Comments

Cross-repo issue link in issue title

durka · 3Comments

Pagination options / endless scroll

alexanderadam · 3Comments

Limit reaction avatars to 2 when there are too many

fregante · 3Comments

Shift-Enter when posting a PR comment submits instead of adding newline

supremebeing7 · 3Comments

Refined-github: url regex is replacing &amp; with &; breaking urls

Most helpful comment

All 6 comments

Related issues

Refined-github: url regex is replacing & with &; breaking urls