Sylius: Make codes utf8mb4-great again

Created on 20 Dec 2016  路  9Comments  路  Source: Sylius/Sylius

Hi,

There is currently a limitation / nasty bug because we are forced to use utf8 as the table charset to accomodate for the code's unique indexes that are stored in a length of 255 chars.

The utf8 charset means that we cannot store in the DB a lot of characters such as emojis, or special characters outside the BMP.

The problem is that a large part of the unicode characters would then break the functionality with a nasty and silent error truncating stored text.

Switching to the recommended utf8mb4 character and relevant collation is not possible since the vast majority of code properties are mapped to VARCHAR 255 UNIQUE that is too big for the max InnoDB index length which is 767 bytes > 4*255 = 1020 bytes

  [PDOException]
  SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes

The solution would be to lower the field size to get into that limit. It would be something like 191 instead of 255.

See https://mathiasbynens.be/notes/mysql-utf8mb4

Potential Bug Stale

Most helpful comment

Hi @pamil

As @sweoggy pointed out, Sylius doesn't require MySQL as I understand, so the only improvement we could easily do without touching the code is add a few lines in the docs stating that if one uses the MySQL DBMS, one should make sure to use the barracuda file format and the utf8mb4 table charset to avoid any _potential_ text truncating issues with out-of-range characters (namely four bytes chars, like 馃挬).

If you want to support utf8mb4 directly, I guess the only things to do are :

  • update config.yml with utf8mb4 as the Doctrine charset
  • state in the docs that if MySQL is used, it must be > 5.5.3 _(released in early 2010)_ to support utf8mb4
  • change all VARCHAR(255) to VARCHAR(191)

All 9 comments

Instead of lowering the key length you can just configure your database properly. Specifically use row format dynamic and the following InnoDB settings (my.cnf or my.ini):

innodb_file_format=barracuda
innodb_file_per_table=1
innodb_large_prefix=1

I use the following Doctrine settings (config.yml):

doctrine:
    dbal:
        # host, port etc.
        charset: utf8mb4
        default_table_options:
            charset: utf8mb4
            collate: utf8mb4_unicode_ci
            row_format: dynamic

Yes @sweoggy it's possible to change the InnoDB setting directly but then Sylius should then require the database to use barracuda as the file format explicitly.

Moreover, this is a configuration that is not always available for the administrator to change. I'm thinking about shared hosting for instance, or PaaS solutions : Heroku, Platform.sh, Sensio Cloud, etc etc .. so I guess this could really be a problem.

@tchapi, I am just not sure changing a key length is a good idea just because one DBMS under a certain configuration has problems with it. Sylius doesn't have to require the barracuda file format, because it doesn't 1. require MySQL/MariaDB 2. require you to use these DBMS with utf8mb4.

On the other hand, I can't see a 191 key length limit being too short for most codes.

I'm sharing the same concerns as @sweoggy, we should define what is the default stack for Sylius and configure the default distribution specifically for it (and supporting other stacks if possible).

@tchapi, can you provide a quick POC to see what changes are required for utf8mb4 support?

Hi @pamil

As @sweoggy pointed out, Sylius doesn't require MySQL as I understand, so the only improvement we could easily do without touching the code is add a few lines in the docs stating that if one uses the MySQL DBMS, one should make sure to use the barracuda file format and the utf8mb4 table charset to avoid any _potential_ text truncating issues with out-of-range characters (namely four bytes chars, like 馃挬).

If you want to support utf8mb4 directly, I guess the only things to do are :

  • update config.yml with utf8mb4 as the Doctrine charset
  • state in the docs that if MySQL is used, it must be > 5.5.3 _(released in early 2010)_ to support utf8mb4
  • change all VARCHAR(255) to VARCHAR(191)

Thanks @tchapi! @TheMadeleine what will be the best place to mention it in the docs?

Cookbook?

still no solution?

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

inssein picture inssein  路  3Comments

mezoni picture mezoni  路  3Comments

mikemix picture mikemix  路  3Comments

stefandoorn picture stefandoorn  路  3Comments

xleliberty picture xleliberty  路  3Comments