Mastodon: On bio, & is replaced by &

Created on 4 Apr 2017  路  9Comments  路  Source: tootsuite/mastodon

I added this url https://pgp.mit.edu/pks/lookup?op=get&search=0x18231B0B449CC9D2 on my bio but the & has been replaced by \& so the url is not the good one : https://pgp.mit.edu/pks/lookup?op=get&search=0x18231B0B449CC9D2

bug

All 9 comments

You should know that this is the correct way to write a URL in HTML. Normally the browser understand it. On GitHub here it's & for the second URL.

So the bug is general to all sent URL, it double encodes &. Once for the text, once for the URL.

@Exagone313 1/ Sorry but I think that when a user copy/past a url, he attends that he doesn't have to rewrite the url.
2/ https://pgp.mit.edu/pks/lookup?op=get&search=0x18231B0B449CC9D2 is not directing me to the good url... and that's normal.

Check the source code of this page for your links and you'll understand what I mean.

The double HTML entity encoding is the issue. In particular, here is where it happens: https://github.com/tootsuite/mastodon/blob/master/app/lib/formatter.rb#L38

However, I am hesitant to mess with that code becose encoding HTML entities is required for preventing XSS attacks by users being able to post HTML.

You just have to use a regex to convert back the URL. Both the URL and the displayed text should be encoded once, so you just have to convert &amp; to &, if I'm not wrong. Check for a &lt; in a URL, after that. The problem should be about malformed URL like &<.

Where are non-local messages handled? It does not trust other instances about that, right?

Also, I think we should open a new issue about how to treat malformed URL with special characters. Because there are multiple possible parts in a URL: protocol, user, password, host (see xn-- form, and also: IPv4 and [IPv6]), port, path (space becomes %20), query string (space becomes +). Then we have the way to write URL in HTML, with that & that becomes &amp; (with special characters, the URL displayed in the <a> node may not be the link-encoded form!)

Here it only allows https?:// protocols, could be blacklist-based (javascript:), while consider that some protocols like magnet don't use protocol://addr but protocol:addr.

The scope could be larger than only bio.

An example of problematic ampersand issue is at https://mastodon.social/@envlh/1644777

This is a legitimate URL to Wikidata, a side project of Wikipedia. Users expect to be able to use some characters in URLs without encode them to use ?foo=bar&quux=truequery string syntax in URLs.

The problem is the combination of encode and link_urls.

>> encoded = Formatter.instance.send :encode, 'https://pgp.mit.edu/pks/lookup?op=get&search'
=> "https://pgp.mit.edu/pks/lookup?op=get&amp;search"

>> Formatter.instance.send :link_urls, encoded
=> "<a href=\"https://pgp.mit.edu/pks/lookup?op=get&amp;amp;search\" rel=\"nofollow noopener\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"ellipsis\">pgp.mit.edu/pks/lookup?op=get&</span><span class=\"invisible\">amp;amp;search</span></a>"

We can't simply remove encode because link_urls doesn't encode other HTML entities.

>> Formatter.instance.send :link_urls, ' a & b & <script>'
=> " a & b & <script>"

And actually link_urls isn't correctly encoding the display text of the <a> either:

>> Formatter.instance.send :link_urls, 'https://pgp.mit.edu/pks/lookup?op=get&search'
=> "<a href=\"https://pgp.mit.edu/pks/lookup?op=get&amp;search\" rel=\"nofollow noopener\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"ellipsis\">pgp.mit.edu/pks/lookup?op=get&</span><span class=\"invisible\">amp;search</span></a>"

I wonder what features we're relying on from Twitter::Autolink. Rails' auto_link seems to handle this correctly:

>> helper.auto_link "https://pgp.mit.edu/pks/lookup?op=get&search"
=> "<a href=\"https://pgp.mit.edu/pks/lookup?op=get&amp;search\">https://pgp.mit.edu/pks/lookup?op=get&amp;search</a>"

>> helper.auto_link '<a href="https://pgp.mit.edu/pks/lookup?op=get&search">foo</a>'
=> "<a href=\"https://pgp.mit.edu/pks/lookup?op=get&amp;search\">foo</a>"

Now that https://github.com/tootsuite/mastodon/pull/2138 has been merged, this issue should be resolved, so I'm going to close it. @mart1oeil if you feel this hasn't been resolved to your satisfaction, please just let us know 馃憤

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ccoenen picture ccoenen  路  3Comments

almafeta picture almafeta  路  3Comments

lauramichet picture lauramichet  路  3Comments

selfagency picture selfagency  路  3Comments

psychicteeth picture psychicteeth  路  3Comments