Two threads on this topic appeared recently:
https://cybre.space/@LogicalDash/100656564494037379
https://chitter.xyz/@quine/100652452345102792
This deals with status content like:
:hacker_h: :hacker_i: :hacker_y: :hacker_o: :hacker_u:
used to spell out "hi you" in emojo.
Note that there is a space between each letter of a word, and two spaces between words.
This feature request is to add <span aria-label="hacker: hi you"> ... </span> around it. I believe this would be a modification to the Formatter#encode_custom_emojis method, though on a closer look it might be too late because the spaces between them are significant.
The algorithm is to find /(.+)_([a-z])/, separated by one or two spaces, where \1 (the first capture) is equal. \2 (the second capture) spells out the text. One space is replaced with zero; two spaces are replaced with one.
Oh, let me state the problems this is trying to solve:
it's important to note that many other characters (such as zero width spaces) can and often are used to separate emoji alphabets (on sleeping town, zero width spaces are inserted after custom emoji by default)
so this should take into account any characters used as space separators.
i think a more ideal solution would be making it possible for an admin to mark emojis as part of an alphabet or something, but maybe the regex hack would be good enough as long as people always follow the format of something_a, something_b, etc
I had been worried about an outlier, and I've found one: https://bofa.lol/@succubus/100665710929415112
It contains alternating :rh_ and :hacker_ in it.
Definitely agree about being able to mark emojis as part of an alphabet, I think that will be the only foolproof way to detect this.
I think we should be able to define custom alt-text/title/summary to emoji (in the admin interface at least), but it indeed doesn't solve this problem completely. At least having the word spelled out would be slightly better than having the whole “hacker_…” etc for each letter. This could be done by adding a summary attribute to the Emoji type I think.
This tool considers emojis as a font if 26 or more emoji exist with the same prefix and either a lowercase or uppercase (or both) ASCII letter. There are a few special cases for symbol characters, but those could probably get away with being undescribed since they're very rarely included in these "fonts". It might be best to just ignore unknown emoji that share a prefix with emoji known to be a font in the aria-label.
https://benlubar.github.io/mstdn/fonts.html
I would advise against changing the alt attribute to anything other than the emoji code between colons because that would make copying a post with emoji in it copy the alt text instead.
Most helpful comment
it's important to note that many other characters (such as zero width spaces) can and often are used to separate emoji alphabets (on sleeping town, zero width spaces are inserted after custom emoji by default)
so this should take into account any characters used as space separators.
i think a more ideal solution would be making it possible for an admin to mark emojis as part of an alphabet or something, but maybe the regex hack would be good enough as long as people always follow the format of something_a, something_b, etc