Renovate: Replace emoji short codes with unicode representation

Created on 22 May 2019  ·  32Comments  ·  Source: renovatebot/renovate

Emoji shortcodes don't render in all instances, specifically in emails. Additionally, they're mostly to help humans in typing emoji, without having to find the exact Unicode character.

I would like to replace the emoji shortcodes, with the actual Unicode code point. so :vertical_traffic_light: becomes 🚦, :recycle: becomes ♻️.

Thoughts?

priority-3-normal

All 32 comments

Sounds ok to me if it renders everywhere

@JamieMagee I noticed on GitHub that emoji shortcodes look best in commit messages (because they're text) but unicode would be better for PR titles, because otherwise GitHub doesn't render them. I was considering an option like "Replace emoji with unicode in PR titles" to solve this, but would this be compatible with Azure DevOps too?

@rarkins I think it would be simplest to just use unicode everywhere. Git is designed with UTF-8 support everywhere. From https://git-scm.com/docs/git-commit

Commit log messages are typically encoded in UTF-8, but other extended ASCII encodings are also supported.

Additionally most terminals support unicode, so displaying emoji as a developer should be supported.

The main issues I had with using emoji shortcodes is that they weren't supported everywhere. The 2 cases that affected me were in emails, and in Microsoft Teams notifications.

:tada: This issue has been resolved in version 18.21.8 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

@rarkins would you be open for a PR that makes replacing emoji shortcodes with their unicode equivalent configurable? In older versions of bitbucket server (together with older version of MysQL) there is a problem with 4 byte Unicode Characters, which results in a 500 error when creating the PR.

@nvdbleek This seems like a reasonable request.

Out of interest, do you know which versions of Bitbucket Server/MySQL this affects? I can't find any detailed information for Bitbucket Server specifically, but I found this for Fisheye, which suggests MySQL 5.7.7, and this page suggesting that uft8mb4 was supported in MySQL 5.5

@JamieMagee uft8mb4 is supported in MySQL 5.5, but you get into trouble because of the max key length of 767 bytes. Bitbucket has keys of 255 characters and 1020 bytes (4x255) is greater than the maximum of 767... So you can't use uft8mb4, or at least I couldn't get it to work ...

@nvdbleek Thanks for the explanation. I see that the limit was raised to 3072 bytes in MySQL 5.7, which I assume is why Fisheye requires 5.7.7 or higher.

@JamieMagee should we revert in the meantime?

@rarkins Yeah, a revert is the quickest fix. I'm heading on vacation today, so I definitely won't have time to implement the configuration.

I’m on vacation too so I think I’ll go with a revert today and then we reimplement with some kind of mapping later.

@rarkins and @JamieMagee I would be happy to help and create a PR, but I will need some guidance on a config option name. Then I could make the change, create a test-case and create the PR.

I think it might be named something like “emojiType”, default to value “shortcode” and have a second option “utf8”.

Starting with a revert seems best for now though.

@rarkins are you going to re-open this issue? Then I can make the PR to rollback and attach it to this issue. And this evening I will start working on the "emojiType " config option (don't have a lot of time this evening, I might finish it this weekend)

sorry, I have to do more work for my day-job this weekend then I was anticipating... Will try to work on this next week, or otherwise the week after next week when I'm on vacation then I will have for sure more time available...

We probably don't need to make this configurable, at least until we're presented with a compelling reason. For now it should be OK to use unicode for all except Bitbucket Server. So the question is - what's the best way to do that?

A nice way to be to use Bitbucket Server's getPrBody to massage unicode chars to emojis, including possibly with this library: https://github.com/iamcal/js-emoji

However I think for that to work then we need to ourselves determine and extract the emoji characters, e.g. something like this:

let output = '';
prBody.split('').forEach(char => {
  if (isEmoji(char)) {
    output += emoji.replace_unified(char);
  } else {
    output += char;
  }
});

Bitbucket server is working fine with unicode here. so i would prefer a config switch.

BBS: 6.4.0
SQL: MS SQL Server 2014 SP3 CU2 (SQL_Latin1_General_CP437_CI_AI)

as the bitbucket-server docu says, there is no unicode support with mysql for all versions:

image

https://confluence.atlassian.com/bitbucketserver/connecting-bitbucket-server-to-mysql-776640382.html

@nvdbleek any chance to switch the database to postgres or msssql or anything else with unicode support? This is a really bad blocker because we need a unicode replacement for the \| char in markdown table

@ViceIce I don't think that's an issue. utf8 data type supports up to 3 bytes per character, and | is a single byte character in utf8 (0x7C). The problem is all emoji, like 🚦 are 4 bytes (In this case 0xF0 0x9F 0x9A 0xA6) and that requires use of utf8mb4, which is apparently not supported by Bitbucket Server.

@JamieMagee They are supported by Bitbucket Server, but not with MySQL.

I'm using emojis without problems on MS SQL Server.
I think the problem is, that Bitbucket supports the ancient MySQL 5.5 which has the index limit.
Also they only support the Connector/J 5.1, which is very old too, but supports utf8mb4

So it should be possible to use emojis, if you use Connector/J 5.1.47 with MySQL 5.7 / MariaDB 10.2 or later. (This versions comes with InnoDB 5.7 with expaned index length)

I know that the pipe is a single byte char. I've found another bug with markdown tables.

| head | test | e |
| --- | --- | --- |
| unescaped | `^7.0.0 | ^8.0.0` | test it |
| escaped | `^7.0.0 \| ^8.0.0` | test it |

| head | test | e |
| --- | --- | --- |
| unescaped | ^7.0.0 | ^8.0.0 | test it |
| escaped | ^7.0.0 \| ^8.0.0 | test it |

So we have to escape the pipe.

As simple solution @rarkins and i thougt that we can simply replace the pipe with an unicode char, which looks like a pipe. But all alternatives require 4 bytes.

I will build some docker images to test emojis with newer mysql/mariadb support.

/edit: added issue link

Our idea is:

  • Massage the markdown to make sure we don't break table columns in each platform
  • For platforms that display the escaped pipe (incorrectly), replace the \| with a unicode pipe lookalike

@ViceIce I tried changing the collation of an old MySQL DB to mb4 UTF-8 and I can confirm that there is a problem with the max key length of 767 bytes. Bitbucket has keys of 255 characters and 1020 bytes (4x255) is greater than the maximum of 767. This limit is indeed bumped in MySQL 5.7. Atlassian even documents how you can change the collation to a 4-byte UTF-8 Unicode encoding on https://confluence.atlassian.com/kb/how-to-fix-the-collation-and-character-set-of-a-mysql-database-744326173.html

An upgrade to MySQL 5.7 for my database is on our roadmap. But I suspect that I'm not the only person that is running Bitbucket on an older MySQL. So the question is does Renovate wants to support those installations? If the answer is yes, we have to decide how we want to handle this. I like the proposal of @rarkins to massage Unicode chars to emoji Bitbucket Server's getPrBody. I don't see any substantial downsides of not using Unicode chars for Prs in Bitbucket (it is not that much extra code and it is also only a couple of bytes extra in the DB).

If we can agree on an approach I'm volunteering to create a GitHub PR.

Given that Bitbucket Server (now) recommends against MySQL, and MySQL 5.7 was apparently released in 2013, I definitely have some doubts about how widespread this combination is amongst current or future Renovate users.

The main downsides for supporting unicode massaging are:

  • Having to maintain an extra config option for this (assuming we can't autodetect the DB in use)
  • Always having to think twice before using unicode anywhere else, e.g. the pipe substitution issue, which is not related to emoji

The escape bug in bitbucket server will be fixed in near future: https://github.com/atlassian/commonmark-java/issues/149
hopefully not so long future 😄

Using initial db with utf8mb4_bin not supported, trying to convert later from utf8_bin
image

I've tried to create a sample with mysql 5.7: https://github.com/ViceIce/renovate-docker

I can't create a repository 😱 used utf8_bin as db collation

image

@ViceIce and what if you change it to CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ? The 4 byte encoding version?

@ViceIce sorry, you first tried utf8mb4_bin, I missed that comment...

@nvdbleek install not possible, after install i can't convert column collation ecause of a key contraint

the error could be from docker for windows local path mapping, i will try a docker-volume later.

:tada: This issue has been resolved in version 19.51.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

OmgImAlexis picture OmgImAlexis  ·  4Comments

OmgImAlexis picture OmgImAlexis  ·  4Comments

ZauberNerd picture ZauberNerd  ·  4Comments

zephraph picture zephraph  ·  3Comments

rarkins picture rarkins  ·  3Comments