Mastodon: Angle brackets should delimit URLs in the body of toots

Created on 10 Nov 2017  ·  17Comments  ·  Source: tootsuite/mastodon

In https://mastodon.social/@zwol/98982414809236980 I wrote

... the UMass-Amherst Science Fiction Society <http://www.umass.edu/rso/scifi/>'s filk circle ...

The code that scans for URLs in the body of a toot interpreted this as linking to http://www.umass.edu/rso/scifi/%3E's. This Is Wrong. < and > are the official way to delimit URLs in running text, have been ever since URLs were invented. The URL should be understood to end just before the >.


  • [x] I searched or browsed the repo’s other issues to ensure this is not a duplicate.
  • [ ] This bug happens on a tagged release and not on master (If you're a user, don't worry about this).
suggestion ui

Most helpful comment

@cassolotl Angle brackets are (would be, anyway) useful __because__ it linkifies URLs. The entire point here is to have an explicit way to control where the linkifier thinks the URL ends. Again, see the example in the original bug description.

All 17 comments

That's part of Markdown syntax but we don't support Markdown. Is there any reason to even write text that way under these circumstances?

@gargron markdown doesn't use angle brackets

but I'm interested in the answer nonetheless, it feels very weird to use
angle brackets to delineate urls in a toot

On Sun, Apr 8, 2018, 3:54 PM Eugen Rochko notifications@github.com wrote:

That's part of Markdown syntax but we don't support Markdown. Is there any
reason to even write text that way under these circumstances?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tootsuite/mastodon/issues/5654#issuecomment-379577600,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAORV0BsVEi9D0u2mB4V4SVuijWfe-t4ks5tmmr2gaJpZM4QaLmJ
.

@nightpool No, it does. There is the [Text](url) syntax but also <url> syntax. The latter performs linking, because autolinking isn't actually part of Markdown, even though it's done by most implementations anyway.

@nightpool @Gargron Using angle brackets to delimit URLs is not part of Markdown. It's recommended practice according to the official syntax specification for URLs: RFC 3986, Appendix C. It is especially useful for plain-text contexts that auto-linkify URLs but don't implement any other markup that would let you control where the URL ends, such as ... toots.

I gave an example in the original bug description of a sentence that is more natural if you can do this.

Mastodon linkifies URLs, so why would one use triangle brackets?

@cassolotl Angle brackets are (would be, anyway) useful __because__ it linkifies URLs. The entire point here is to have an explicit way to control where the linkifier thinks the URL ends. Again, see the example in the original bug description.

I see! Thank you. :)

So this essentially bypasses URL handling logic when calling a specific URL and delimiting it with <>? Although for what it's worth, Mastodon's current behavior is to treat the > as part of the URL, which ends up breaking it!

https://perishablepress.com/stop-using-unsafe-characters-in-urls/

More about “unsafe” characters from RFC1738:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters “<” and “>” are unsafe because they are used as the delimiters around URLs in free text; the quote mark (“"”) is used to delimit URLs in some systems.

Probably should also mention that a quote mark (") also ends up breaking the URL in Mastodon. Possibly related issues: #6701, #6031

As an example: Posting "http://trwnh.com/" on Mastodon will hyperlink as http://trwnh.com/", and posting <http://trwnh.com/> on Mastodon will hyperlink as http://trwnh.com/>. The only real workaround currently is to remove the trailing slash in any pasted URLs, but this could also break the URL if the server doesn't know how to handle removing the trailing slash (e.g. if the nginx configuration doesn't include a try_files directive or a regex location block)

So this essentially bypasses URL handling logic when calling a specific URL and delimiting it with <>? Although for what it's worth, Mastodon's current behavior is to treat the > as part of the URL, which ends up breaking it!

I'm not sure what you mean by "bypasses URL handling logic", but the change I'm specifically asking for is to __not__ treat the > as part of the URL.

@zackw I mean that the proposal implies that the text parser should essentially support wrapping whatever it finds inside <...> in an <a href=""></a> tag -- "URL handling logic" meaning the function that Mastodon uses to determine what is and isn't a URL. It would be problematic to say that > should be explicitly disallowed in URLs, since it's only "unsafe" or "discouraged" -- it's still a valid URL in some cases.

> is disallowed in urls, and should be percent-encoded

With codl's clarifying remark on > legality in URLs, is this at a point where a clean patch to that effect (ie. one removing > from the auto-linked characters set, so that <> terminated URIs would still be left to the auto-linker but don't swallow the >) be accepted?

@chrysn we would probably accept such a patch, but i don't believe it would address the concerns of this issue

@nightpool, I think it'd be a good-enough solution; users like me who don't trust a new system's autolinker (or know it'd need to guess, as in "(eg. http://example.com/)") and angulate their URIs would succeed in getting them properly terminated. It sure would be nice to have the angular brackets vanish then as well, but that's cosmetic vs. the current behavior of such links breaking (and might raise all sorts of non-trivial questions like "When is not intended as a URI?" and "What are the security implications of linking to non-http[s] URIs?").

I just started tracing where that could be changed (not speaking any Ruby doesn't help), but failed to pinpoint it, as

we override the regex on this line:
https://github.com/tootsuite/mastodon/blob/master/config/initializers/twitter_regex.rb,
i believe the change required is to add > to the character class on line 5,
inside of :valid_url_path_ending_chars

On Mon, Jan 21, 2019 at 5:07 PM chrysn notifications@github.com wrote:

@nightpool https://github.com/nightpool, I think it'd be a good-enough
solution; users like me who don't trust a new system's autolinker (or know
it'd need to guess, as in "(eg. http://example.com/)") and angulate their
URIs would succeed in getting them properly terminated. It sure would be
nice to have the angular brackets vanish then as well, but that's cosmetic
vs. the current behavior of such links breaking (and might raise all sorts
of non-trivial questions like "When is not intended as a URI?" and "What
are the security implications of linking to non-http[s] URIs?").

I just started tracing where that could be changed (not speaking any Ruby
doesn't help), but failed to pinpoint it, as


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/tootsuite/mastodon/issues/5654#issuecomment-456210912,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAORV9W81FYt20ruQvSG5GiYmMeAlIgSks5vFjoZgaJpZM4QaLmJ
.

here's some context on why we override these regexes: https://github.com/tootsuite/mastodon/pull/4941

(although I don't understand specifically why we do it here instead of in lib/extractor, my guess was either that class didn't exist at the time, and when it was added this code wasn't moved into it, but that's just a guess)

Without understanding all of the construction I agree it should go into the negated part of :valid_url_path_ending_chars, and probably also in the negated part of :valid_general_url_path_chars (because, as @codi pointed out, nothing with a < or > inside it is a legal URI or even IRI).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cwebber picture cwebber  ·  3Comments

almafeta picture almafeta  ·  3Comments

KellerFuchs picture KellerFuchs  ·  3Comments

hugogameiro picture hugogameiro  ·  3Comments

golbette picture golbette  ·  3Comments