Core: [Emoji] Consider switching back to a client-side emoji implementation

Created on 15 Jun 2018  路  18Comments  路  Source: flarum/core

See https://github.com/s9e/TextFormatter/issues/99

Basically, we no longer want to use TextFormatter's EmojiOne plugin because EmojiOne's licensing is too restrictive.

Twemoji has recently been updated and remains CC-BY 4.0 which is great. However due to some technical details it's not a simple drop-in replacement for EmojiOne in TextFormatter.

We have three options:

  1. Do a bit of extra work to get Twemoji to be a drop-in replacement. (Or more accurately, rely on @JoshyPHP's bit of extra work):

    You can find it at https://github.com/s9e/emoji-assets but have no long term plan and I can't say how long I'll keep it up to date.

  2. Store emoji in the database as unicode (ie. no more TextFormatter parsing of emoji). Convert unicode emoji to Twemoji during TextFormatter rendering.

  3. Store emoji in the database as unicode (ie. no more TextFormatter parsing of emoji). Delegate any processing to the client-side.

    • The benefit to this is that any novel Flarum client (eg. a phone app) receives plain unicode emoji and gets to decide what it wants to do with them (ie. leave them as native, or render with a different emoji set)
    • Twemoji.js is 10kb minified - would be negligible gzipped - this is what we used to use before Twemoji went stale
typcleanup

Most helpful comment

@CDK2020 Easier for whom? Developers, sure. But this would come at the cost of code that would have to be run on all clients every time a post is rendered. Compare that with a (somewhat hard to write) migration that is run once on the server...

All 18 comments

I'd prefer having the browser take care of this as much as possible. If I'm not mistaken the third option allows for that and/or have extensions mutate the output visually..

Additionally we would get emoji rendering in discussion titles for free :D

with third option we still need a list map of emoji aliases, twemoji.js does not have them.
anyway I'm not big fan of client rendering because of battery life.

I guess we can make a simple implementation of twemoji.js, it didn't updated in three years. but we'll still need a map aliases.

The current version of Twemoji is in a different directory: https://github.com/twitter/twemoji/tree/gh-pages/2

I guess we can go with client side rendering, though I looked at twemoji.js source and it doesn't seems very complicated, most important thing is this regex. we might be able to reimplement it PHP.

IMO, saving emoji as shortcodes (e.g. :+1:) in database does not seem a good idea because there's not a common standard, it's not flexible and might introduce compatibility issues (e.g. someone else make an alternative Emoji extension, it'll have to be compatible with this extension) and make migrations harder. (I'm thinking about an Emoji extension that allows admin define their shortcodes in ACP).

I'm not sure about including all shortcodes in client either, but if we have do that I guess we need a better search approach because remembering some of those shortcodes is hard, for example in github typing :happ or :smil have same results.

Thanks for the thoughts @sijad.

I think I'm in favour of doing client-side rendering with twemoji.js. At this stage it's the simplest and most flexible solution. I would guess the impact on battery life will be negligible.

I agree, we should store unicode rather than shortcodes because unicode is standardised. The shortcodes should be a purely client-side thing, acting as an aid to select unicode characters. We can potentially source these from EmojiOne alpha codes (MIT). Eventually we will also want to add an emoji picker to our extension, I'm not sure what the best way to go about that will be.

gemoji (https://github.com/github/gemoji/blob/master/db/emoji.json) is another alternative (it has aliases and tags) if we want something similar to github implementation e.g:

:happy: => 馃槃
:smile: => 馃槃

most important thing is this regex. we might be able to reimplement it PHP.

Unicode defines three presentation styles for emoji, targeted at "word processing", "plain web pages", and "texting, chats" environments. As I recall, Twemoji uses the texting/chat presentation while my plugin uses the web page presentation.

If you want to use your own regexp you better build it yourself based on the Unicode data. That's what I do and it's okay if you know about Unicode things but it's not something you will enjoy doing if you don't have an affinity with text processing and/or Unicode. Creating and maintaining big regexps by hand is a sisyphean task, that's why I created a library for it. https://github.com/s9e/RegexpBuilder

https://github.com/s9e/TextFormatter/blob/9b7587ab8405f174466475b8a2d4eb86afc7c53a/src/Plugins/Emoji/Parser.js#L8
https://github.com/s9e/TextFormatter/blob/9b7587ab8405f174466475b8a2d4eb86afc7c53a/src/Plugins/Emoji/Parser.php#L27

can I work on this?

I think this things what should be done (Please correct me if I'm wrong):

  1. move rendering emojis to client side (revert back https://github.com/flarum/flarum-ext-emoji/commit/f4589d0be67be41493953dfee21eeefb2d352e99)
  2. update (recreate?) emojiMap using a simplified EmojiOne alpha codes (eac) or gemoji (?) to insert emoji instead of shortcode

if we gonna use eac, emojiMap will be similar to its current structure, but if gonna use gemoji it'll look like this:

const emojiMap = {
    "馃榾": ['grinning', 'smile', 'happy'],
    "馃槂": ['smiley', 'happy', 'joy', 'haha'],
    // ...
}

also we can categorize emojis so we'll able to reuse it for other purposes (e.g. for a emoji picker):

const emojiMap = {
    "People": {
        "馃榾": ["grinning", "smile", "happy"],
        "馃槂": ["smiley", "happy", "joy", "haha"],
        // ...
    },
    "Symbols": {
        "鉂わ笍": ["heart", "love"],
        // ...
    },
    // ...
}

Yes please, all sounds good! Let's use gemoji.

sorry I forgot the third:

  1. we need to migrate old posts which are contain shortcodes, and replace them with emojis (I'm not sure how hard it gonna be)

other option would be a textFormatter plugin to convert shortcodes to emoji unicodes.

Ah yep, good point. Preferably we'll do this in a migration, without depending on TextFormatter.

Hmm, reopening because we still need a migration.

I can try to make that migration, as it's big migration, i'll try to figure it out how should it be implimented by looking at other migrations, but please let me know your suggestions

@sijad Wouldn't it just be easier to convert the shortcodes to unicode on the JS side at run-time instead of converting every post?

@cdk2020 in that case we have to add old emoji short codes in client side and i think it is not an ultimate solution if we dont want to support short codes in the future.

@CDK2020 Easier for whom? Developers, sure. But this would come at the cost of code that would have to be run on all clients every time a post is rendered. Compare that with a (somewhat hard to write) migration that is run once on the server...

Done, now that flarum/flarum-ext-emoji#18 is merged. Thanks, @sijad !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clrh picture clrh  路  4Comments

MichaelBelgium picture MichaelBelgium  路  4Comments

jordanjay29 picture jordanjay29  路  3Comments

webpigeon picture webpigeon  路  3Comments

datitisev picture datitisev  路  3Comments