Julia: Normalize Emoji variant selectors in identifiers?

Created on 2 Apr 2019  路  8Comments  路  Source: JuliaLang/julia

Today I came across the existence of emoji variant selectors (basically what happened was that the unicode standard already had a bunch of symbols that were a little like emoji but not colorful, so they added a combining character to make those colorful and similarly to make emoji non-colorful). At the moment, we consider these significant in identifiers, so we allow things like:
Screen Shot 2019-04-02 at 5 03 27 PM

We should make a decision on whether we want to normalize out this distinction in our identifier normalization.

unicode

Most helpful comment

These are hardly the confusable characters that I would worry about most (as opposed to, say, vs. A), given that the emoji tab completion was added as an April Fool's joke (#10709) and emoji variables are mostly a party trick in Julia rather than a practical programming style. Let the Unicode Consortium Emoji Emporium worry about this.

xkcd comic

All 8 comments

A related issue: Should we change the \:emoji: completions to include the emoji variant selectors? E.g. \:phone: completes to the non-emoji variant.

I feel like emoji normalization is something we should leave to the Unicode consortium 鈥斅爄t seems like they should really fix this in NFC, and it's not worth the effort for us to use a custom normalization here.

For tab completion we can do whatever we want, of course.

I agree. The main concern for identifier normalization is when two different identifiers are both easy to input and hard to distinguish. That doesn鈥檛 seem to be the case here. If the Unicode consortium decides to normalize these then we can follow suit.

That doesn鈥檛 seem to be the case here.

Isn't it? iTerm2 has decent unicode support, but half the other software I tried (including all editors we support) either render them the same or render one or the other as a replacement character. As for inputting them, if I google "telephone emoji" I get to https://emojipedia.org/black-telephone/ and if I copy that, I get the emoji variant, which is different from what you get by doing \:phone:<tab>. Worse, if I accidentally backspace on one of the emoji variant identifiers in sublime, I get the non-emoji variant (which may or may not be rendered the same).

The most conservative option would be to reject modified emoji altogether.

True, but for a number of emoji (鈽庯笍 being an example), the rendering that people identify with the emoji is the one that has the variant selector.

These are hardly the confusable characters that I would worry about most (as opposed to, say, vs. A), given that the emoji tab completion was added as an April Fool's joke (#10709) and emoji variables are mostly a party trick in Julia rather than a practical programming style. Let the Unicode Consortium Emoji Emporium worry about this.

xkcd comic

Emoji Emporium

馃槀 too true

Was this page helpful?
0 / 5 - 0 ratings