Mattermost-server: Consider modifying gorp to consistently create TEXT columns between MySQL and Postgres

Created on 20 Feb 2019  路  12Comments  路  Source: mattermost/mattermost-server

If you're interested please comment here and come join our "Contributors" community channel on our daily build server, where you can discuss questions with community members and the Mattermost core team. For technical advice or questions, please join our "Developers" community channel.

New contributors please see our Developer's Guide.


Notes: Jira ticket

In gorp, dialect_postgres.go

func (d PostgresDialect) ToSqlType(val reflect.Type, maxsize int, isAutoIncr bool) string {
    // ...
    if maxsize > 0 {
        return fmt.Sprintf("varchar(%d)", maxsize)
    } else {
        return "text"
    }

By contrast, dialect_mysql.go:

func (d MySQLDialect) ToSqlType(val reflect.Type, maxsize int, isAutoIncr bool) string {
    if maxsize < 1 {
        maxsize = 255
    }

    if maxsize < 256 {
        return fmt.Sprintf("varchar(%d)", maxsize)
    } else {
        return "text"
    }

Put another way, when we set a max size >= 256 on a column via gorp, Postgres enforces the limit using a varchar but MySQL does not and uses text. Fortunately text and varchar are very similar in Postgres in terms of being outside the table, but if we set a max size of 0, we end up with a VARCHAR with max length 256 compared to the MySQL column with a much longer limit.

Changes should be made to our fork of gorp here (and potentially propagated upstream to gorp if they're interested in taking them): https://github.com/mattermost/gorp

If you have any questions or want to discuss the ticket, feel free to ask @jesse.hallam or @jesus.espino in the ~Developers channel on community.mattermost.com

AreDB Hard Help Wanted PR Exists TecGo

All 12 comments

@jwilander Hi. When you say:

but if we set a max size of 0, we end up with a VARCHAR with max length 256 compared to the MySQL column with a much longer limit.

I'm a little unsure what the desired behaviour is. Do you want the driver to set MySQL to text when the maxsize is 0 (like how gorp sets Postgres)?

Yes, I think that's exactly what we'd want to do. The main goal is we want to be consistent across MySQL and Postgres as much as possible

cc @jespino @lieut-data for any thoughts

(You may know this, but just in case...) One issue to consider, is that (in MySQL) text acts like a BLOB (but a character string instead of binary). So there are some restrictions. Eg,

  • if you want to index on it, you have to specify an index prefix length
  • you cannot have a DEFAULT value
  • you have to set a max_sort_length when sorting (or be fine with the default of 1024)
  • (maybe most important) if a query results in a temporary table, the server is forced to use the table on disk rather than in memory
    from the docs

As you mentioned above, Postgres doesn't have this problem. text and varchar are represented the same in Postgres's C (varlena).

@lieut-data any thoughts on the above?

My apologies for the original write up of this ticket: it was definitely ambiguous as to the desired outcome. First, let me summarize the cases here for my own recollection:

  • for maxsize = 0, gorp creates wildly different constraints between Postgres (unbounded TEXT) and MySQL (VARCHAR(255)).
  • for maxsize > 0 && maxsize < 256, gorp correctly unifies Postgres and MySQL on a varchar(maxsize).
  • for maxsize >= 256, gorp deviates again between Postgres (VARCHAR(maxsize)) and MySQL (TEXT).

Rewriting this as a table:

| maxsize | MySQL | PostgreSQL |
| - | - | - |
| 0 | VARCHAR(255) | TEXT |
| < 256 | VARCHAR(maxsize) | VARCHAR(maxsize) |
| >= 256 | TEXT | VARCHAR(maxsize) |

The practical pain points are, of course, two-fold:

  • gorp fails to consistently enforce column length constraints for maxsize = 0 and maxsize >= 256
  • gorp fails to uniformly pick column types for maxsize = 0 and maxsize >= 256

My original thinking was to change gorp's semantics as follows:

| maxsize | MySQL | PostgreSQL |
| - | - | - |
| 0 | TEXT | TEXT |
| < 256 | VARCHAR(maxsize) | VARCHAR(maxsize) |
| >= 256 | VARCHAR(maxsize) | VARCHAR(maxsize) |

I think there's a general consensus among the team that we dislike delegating control of the schema like this to Gorp in the first place, but as a step towards wresting ownership back, we could at least make it consistent. This proposal needs more than just an implementation, though: we need to make sure these changes make sense for the columns they would affect.

On the issue of TEXT in MySQL: from the above, this would only impact columns without a maximum length. It /may/ be the case that we don't have such columns in the first place, but as you correctly note, this would be a significant change from their existing definition. It /might/ make sense to switch to VARCHAR(255) for Postgres in this case instead to bring about consistency.

Lastly, if we agree to change the default here, we'll be left with a schema mismatch between a fully upgraded instance and a newly installed instance: we will want to discuss writing a schema migration to unify this, but we'll have to think through the ramifications. For example, we can never hope to change a column definition on the Posts table: it's just too big for too many customers. Channels might be in a similar boat (though not nearly as big). Most other tables should be "reasonable". All of these decisions should backfeed into the original plan for what to actually change.

Phew. @cpoile, I'd love your eyes on the above -- let me know if the current situation matches your understanding, and if you think it makes sense to move on from here as proposed. I'm bumping the difficulty on this one, not so much in the coding sense, but in the "deep knowledge" required about what's going on sense. :)

I'd like to take this on.
The 255 varchar limit dates back to mysql < 5.03 (released in 2005). I think it's safe to assume that nobody will be using mattermost with that! :)

I want to follow @lieut-data suggestion but use "varchar(16384)" instead of "text" for maxsize = 0.
It's still arbitratry but it's safer than forcing the use of a mysql text/blob, which has consequences as @cpoile mentioned.

See https://github.com/mattermost/gorp/pull/5 for a more detailed breakdown, but the plan is to converge on consistency for maxsize == 0. The practical impact to the mattermost-server is small, since we don't actually want to have unbounded columns in our schema.

We'll leave maxsize > 255 alone, since it's not likely we want to migrate existing customer data for the affected tables. When we get rid of gorp, we'll just formalize this by properly "writing down" the schema for each driver type instead of relying on gorp magic.

Thanks, @brunoenten for helping is drive this issue to completion! I'll leave this open a bit longer to track the required migration changes.

@lieut-data I'm not sure what the next steps on this issue are. Can you help clarify this?

@lieut-data Gentle reminder about the question above

I've filed https://mattermost.atlassian.net/browse/MM-16888 to track the requisite migration changes. I think we can resolve this issue.

(Thanks again, @brunoenten!)

Was this page helpful?
0 / 5 - 0 ratings