Core: Fullwidth characters immediately after a URL break the edit function

Created on 23 Sep 2016  路  17Comments  路  Source: flarum/core

As demonstrated in this thread, a fullwidth character (such as those used in Japanese and Chinese) immediately after a URL break the post edit function. A post containing such a string cannot be previewed or edited.

Also note that even in the discussion view, the wrong part of the text is getting linkified as a result of the problematic string. Both of these phenomena seem to be related to the fact that the parser can't determine where the URL ends.

This bug should be given a fairly high priority, as such strings (e.g. a URL enclosed in full-width parentheses) would not be unusual on sites in languages that use fullwidth glyphs.

typbug

All 17 comments

There are two different issues here. The first one, the preview not working, is probably the same bug as https://github.com/s9e/TextFormatter/issues/40. It's already fixed in master and should be available soon. One way to fix it now would be to include punycode.js in the page. Actually I'd recommend including punycode.js either way to handle IDN URLs. Please check out this online demo to verify that the preview works there.

The other one is that the URL doesn't end where expected. That's a difficult issue to solve because this sentence does not contain any whitespace and while it's easy for a human to determine where the URL should end, it's more difficult to determine programmatically. I'll try to think about it but I'm not very optimistic about it. In the meantime you'll have to use explicit links either through Markdown or BBCodes.

@JoshyPHP Thanks for looking into this. The second issue is indeed problematic, since one can't rely on whitespace for delimiters in some languages. And of course we do need to provide for the use of fullwidth characters not only after URLs, but within them ... and even in IDNs, as you say.

Would it be possible to add a toggle that would turn the autolink functionality off for certain locales? This would require users in those locales to specify links manually with Markdown or BBcode ... which would be a bother, but might better than the current situation (the autolink simply turns everything from the "http" to the EOL into a link).

It wouldn't cover every situation ... you'd still get improper autolinking if someone wants to link a Japanese IDN URL in an all-English site, for example ... but it would help cover the majority of them.

An easier alternative might be to make it a manual sitewide toggle. Then admins who think this sort of thing is likely to crop up often on their sites can just turn off the autolink functionality altogether.

@dcsjapan I've opened an issue in the library's repository. If you have any real-world examples of non-ASCII URLs or URLs that are enclosed in non-ASCII punctuation, feel free to post them in a comment over there.

@JoshyPHP I'll have a look around for some examples.

@dcsjapan could you please give an example? Or give pointers how to reproduce?

@Luceos Please see this thread regarding reproduction. Kulga's screencast says it all.

As for examples, there are quite a few in that thread. (I just added a couple at the end.)

@JoshyPHP has made some changes to his library that may fix the problem; I'm not sure if Toby has added them to Flarum yet. See the issue Joshy linked above for details.

I get a different error if I try editing that post now:

forum-70f79e94.js:1192 Uncaught ReferenceError: punycode is not defined
    at Object.parseUrl (forum-70f79e94.js:1192)
    at Object.filterUrl (forum-70f79e94.js:1190)
    at Array.zb (forum-70f79e94.js:1181)
    at Array.V (forum-70f79e94.js:1180)
    at Tb (forum-70f79e94.js:1169)
    at Lb (forum-70f79e94.js:1168)
    at Jb (forum-70f79e94.js:1161)
    at Object.preview (forum-70f79e94.js:1360)
    at b (forum-70f79e94.js:432)
    at g.value (forum-70f79e94.js:432)

seems like a dependency is now not loading correctly, can't even get the editor in view.

@dcsjapan thanks for taking the time to reply, gives a bit more background information in order to solve this.

The big problem seems to be fullwidth characters where the automatic linkifier is expecting to find a domain name. Fullwidth characters appearing later in the URL don't cause the same issue.

@Luceos It's #1040, fixed in https://github.com/s9e/TextFormatter/issues/40.

@dcsjapan The URL validator only accepts ASCII domain names. If you don't have something to punycode IDNs, it will reject those URLs.

@JoshyPHP would you be able to provide the fix version (also in the future) it would greatly speed up our solving time. Also I think it would be wise to support IDN domains too, as they are starting to become more common.

@Luceos The issue I mentioned was closed a while ago so unless you're talking about a different one then it's already fixed. If you want to support IDNs you need intl on the PHP side and punycode.js on the JavaScript side.

@JoshyPHP Hmm... so is this fixed since merging #1049?

@franzliedke I think so, yes.

@franzliedke I guess that hasn't been applied to discuss.flarum.org yet? I'm still seeing the issue there.

Can you try it out on devflarum?

@franzliedke Yep, it's fixed on devflarum. 馃帀

Great, thanks to both of you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Ralkage picture Ralkage  路  4Comments

webpigeon picture webpigeon  路  3Comments

matteocontrini picture matteocontrini  路  3Comments

gingerbeardman picture gingerbeardman  路  4Comments

jordanjay29 picture jordanjay29  路  3Comments