Julia: Unicode reference version

Created on 27 Nov 2019  ·  8Comments  ·  Source: JuliaLang/julia

This might be a silly question :) but ... I noticed that the 1.3 release notes mention Unicode version 12.1.0:

Support for Unicode 12.1.0

here

but the only Unicode reference I can find in this repo is version 9.0.0:

$(SRCCACHE)/UnicodeData.txt:
    @mkdir -p "$(SRCCACHE)"
    $(JLDOWNLOAD) "$@" http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt

from here

Are there two numbering systems? I'm happy to be educated in these arcane Unicode matters, such as what determines which Unicode symbols are in and which aren't...

Most helpful comment

Hi @PallHaraldsson - thanks for the explanation and links...!

So, if I understand correctly, the REPL completions files (emoji_symbols.jl and latex_symbols.jl) determine which Unicode symbols are looked for in the v9 UnicodeData.txt file, so I suppose that that must currently restrict the Julia 1.3 REPL's Emoji support to a few versions behind the current version? So there are quite a few 'current' emojis missing from Julia 1.3 because they were introduced after v9, such as:

🦝🦙🦛🦘🦡🦢🦚🦜🦟🦠🥭🥬🥯🧂🥮🦞🧁🧭🧱🛹🧳🧨🧧🥎🥏🥍🧿🧩🧸♟🧵🧶🥽🥼🥾🥿🧮🧾🧰🧲🧪🧫🧬🧴🧷🧹🧺🧻🧼🧽🧯♾🏴‍☠️🧘🏾‍♀️-🧘🏿‍♀️🦓🦒🦔🦕🦖🦗🥥🥦🥨🥩🥪🥣🥫🥟🥠🥡🥧🥤🥢🛸🛷🥌🧣🧤🧥🧦🧢🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁳󠁣󠁴󠁿🏴󠁧󠁢󠁷󠁬󠁳󠁿🦖 ...

not to mention all the diversity-oriented emojis recently released with v12.0.0 ... (😱)

The v12 file is 2000 lines longer than v9 (many of the additions are new or archaic languages).

The emoji_symbols.jl file appears to use a data file from https://github.com/iamcal/emoji-data; the current version of that (from earlier this year) supports Unicode v11 so that file would also need to be updated, along with your PR to v12, before the REPL completions can be updated to the current standard.

I don't know whether it's the official Julia policy to continuously support all the emojis in the current standard in the REPL, or whether there's any selection process... 🏚🚴‍♀️🖌

It's fun to name your Julia variables 🦖 or 🤔 though... :)

All 8 comments

Unicode is handled by this dependency: https://github.com/JuliaStrings/utf8proc but maybe not only and we have partially 9.0 support (and thus only that, really)? Should the version number in that file simply be updated? There is a http://www.unicode.org/Public/12.1.0/ucd/UnicodeData.txt file and it's 2% longer with change to LATIN SMALL LETTER S WITH HOOK, and adding a lot like HEBREW YOD TRIANGLE.

See at: https://github.com/JuliaLang/julia/blob/master/deps/utf8proc.mk

PCRE also handles Unicode, and maybe that's the only other dependency (then there are packages ICU.jl and possibly other, Scott's).

Possibly that file is only for "tab completion of LaTeX-like abbreviations in the Julia REPL", see here (I didn't check carefully):
https://github.com/JuliaLang/julia/blob/06fed56ea3ac3bd73ca3448f002b0c521eeb1765/doc/src/manual/unicode-input.md

Hi @PallHaraldsson - thanks for the explanation and links...!

So, if I understand correctly, the REPL completions files (emoji_symbols.jl and latex_symbols.jl) determine which Unicode symbols are looked for in the v9 UnicodeData.txt file, so I suppose that that must currently restrict the Julia 1.3 REPL's Emoji support to a few versions behind the current version? So there are quite a few 'current' emojis missing from Julia 1.3 because they were introduced after v9, such as:

🦝🦙🦛🦘🦡🦢🦚🦜🦟🦠🥭🥬🥯🧂🥮🦞🧁🧭🧱🛹🧳🧨🧧🥎🥏🥍🧿🧩🧸♟🧵🧶🥽🥼🥾🥿🧮🧾🧰🧲🧪🧫🧬🧴🧷🧹🧺🧻🧼🧽🧯♾🏴‍☠️🧘🏾‍♀️-🧘🏿‍♀️🦓🦒🦔🦕🦖🦗🥥🥦🥨🥩🥪🥣🥫🥟🥠🥡🥧🥤🥢🛸🛷🥌🧣🧤🧥🧦🧢🏴󠁧󠁢󠁥󠁮󠁧󠁿🏴󠁧󠁢󠁳󠁣󠁴󠁿🏴󠁧󠁢󠁷󠁬󠁳󠁿🦖 ...

not to mention all the diversity-oriented emojis recently released with v12.0.0 ... (😱)

The v12 file is 2000 lines longer than v9 (many of the additions are new or archaic languages).

The emoji_symbols.jl file appears to use a data file from https://github.com/iamcal/emoji-data; the current version of that (from earlier this year) supports Unicode v11 so that file would also need to be updated, along with your PR to v12, before the REPL completions can be updated to the current standard.

I don't know whether it's the official Julia policy to continuously support all the emojis in the current standard in the REPL, or whether there's any selection process... 🏚🚴‍♀️🖌

It's fun to name your Julia variables 🦖 or 🤔 though... :)

I happened to be using emoji_symbols.jl when building a fun Slack app. It would be nice to update to the latest version from https://github.com/iamcal/emoji-data as I was not able to parse 🦃😆

there's a package to support the additional emoji symbols for REPL.
https://github.com/wookay/EmojiSymbols.jl

@wookay Nice package. Can you use any of your code there to update Julia 1.3 to the latest version?

@cormullion well, that package used the same code from emoji_symbols.jl file. you could get the generator.jl.

I can confirm the package does work adding emojis to the REPL, but I wouldn't say not having them or latest Unicode in the REPL means not having Unicode 12.1 support (C has not REPL by default and Perl with "good" UTF-8 support has bad REPL). You can still copy and paste these in (or use the package).

I would be most worried about runtime support, e.g. lowercase and uppercase (and I don't think they apply to emojis).

The UnicodeData.txt file in the Makefile is only there to look up the names of the characters produced by LaTeX-like tab-completions in the REPL in order to generate this section of the documentation. All of the current tab-completion characters are present in Unicode 9, so no one bothered to update this data file to a newer version.

This has nothing to do with the version of Unicode supported by Julia (e.g. for parsing or text processing), which is determined by utf8proc.

The emoji tab completions were added as on April Fool's day in #10709, and I don't know if they have been updated in a while. (Realize that the :foo: tab completions for emoji come from github shortcuts, as I understand it, not from the Unicode standard). It wouldn't hurt to add more recent emoji shortcuts to Base, I guess, though it's hardly essential — just because we don't have a tab completion for a character doesn't mean it's not "supported".

(Most Unicode characters will never have tab completions in the REPL. They are still supported.)

I would suggest closing this issue, as it's really not about Unicode support in Julia. If you want to open another issue to add more emoji tab completions, please go ahead.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omus picture omus  ·  3Comments

iamed2 picture iamed2  ·  3Comments

musm picture musm  ·  3Comments

manor picture manor  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments