Gitea: Github migration fails with error 500 when issue comment is too long

Created on 18 Aug 2019  路  10Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): g2d0b90c96
  • Git version: 2.20.1
  • Operating system: Debian 10
  • Database (use [x]):

    • [ ] PostgreSQL

    • [X] MySQL

    • [ ] MSSQL

    • [ ] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [ ] No

    • [ ] Not relevant

  • Log gist:

Description

Trying to migrate https://github.com/Freeyourgadget/Gadgetbridge to codeberg-test.org running gitea master yields error 500 with the following in log:

2019/08/18 11:50:58 routers/repo/repo.go:320:MigratePost() [E] MigratePost: Error 1406: Data too long for column 'content' at row 1

kinbug revieweconfirmed

All 10 comments

Comment content have ~64KB size. Isn't it enough for almost all the situation?

@lunny
Yes, 64KB ought to be enough for everyone.
Anyway, I got that error above and that should not happen.
I will investigate which post exactly triggers the problem.
In any case 500 should be prevented by cutting off the text at a valid utf-8 boundary.

I agree. The 64K limit sounds reasonable. But whatever the limit, we should check the available size before inserting. A post-migration list of warnings explaining what compromises had to be made during migration would be ideal, so the admin will be aware of the problems and can take corrective/compensatory actions.

Besides a warning in the log, We can maybe reserve a little space to insert a line like ----- cut here, comment length exceeding 64KB -----\n when the text is too long.

I'm pretty sure Github comment restriction is the same 65536 characters as a TEXT column, so it shouldn't be possible to fill it up when migrating.

Perhaps some other bug is happening in this case and this is missing the bigger picture.

I tried to migrate the example repo to see what comment might have been causing it but the migration was successful and I wasn't able to reproduce (on sqlite3).

Perhaps this could be relevant:

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

I believe blob types (e.g. TEXT) are stored in a separate location and don't add up to row size.

Also, character encoding could be set to an improper value and increase the size of the encoded values (e.g. UCS-2).

OK, I think found out why this does not fail hard on try.gitea.io but on my test instance.
sql mode is strict on my server, so exceeding column size fails hard, but is just truncated when sql mode is more relaxed (I assume this is the case on try.gitea.io)

Nevertheless, comment size is bigger on github, so we should somehow handle this.

I added a quick hack to slice the string in go, and it worked. But doing this properly on a Rune boundary is the way to I assume.

We need a PR to check comment size before insert it to database.

@lunny
This is what I did locally, but I think someone with more proficiency in go should do the PR which checks the Rune boundary. Just cutting bytes is not nice.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

internalfx picture internalfx  路  3Comments

lunny picture lunny  路  3Comments

kifirkin picture kifirkin  路  3Comments

adpande picture adpande  路  3Comments

mirhec picture mirhec  路  3Comments