Efcore: Specify collation on columns

Created on 11 Dec 2019  路  20Comments  路  Source: dotnet/efcore

Hi,

This has been raised before and closed, but I would like to emphasis how useful it is.

I want to be able to specify the collation on a column, perhaps through an annotation and have EF Core respect and generate the column as such in the migration.

I have temporarily worked around this by adding some SQL to ALTER the column manually in the migration, but this is a slightly tenuous solution as any subsequent ALTER on the column will reset it back to default.

I posit that setting a column as case-sensitive in the DB is a usual and valid use-case and ideally would like it to be supported in EFCore.

Thank you for considering it.

Ark

closed-fixed type-enhancement

All 20 comments

@arrkaye Can you point to where this was closed before?

~Duplicate of #19275?~

@roji This is 19275. :-)

Oops, I was referring to #6577 but on second look, that issue is about specifying collation (and other things) on the database at creation time, rather than as on the column here. So not at a dup..

@roji Agreed, but my conclusion from thinking about this last night, backed up by the discussions on the other issue today, is that we need to take a holistic approach to collations.

Ideally we should be able to run efficient queries when the collation and query are compatible, and throw good, provider-specific exception messages for other cases indicating how to get to efficient queries for other cases.

I'll pull some issues out of the milestone and we'll discuss with the team on Friday.

You already did!

EF.Collate!

EF.Collate!

Thinking about this in general, per-operation collation support (as opposed to column/database collation settings) is a bit scary... Assuming we add this to SqlBinaryExpression (in the extreme case we can make SqlExpression IAnnotatable :fearful:), preserving these across all transformations is not going to be easy, can see lots of bugs coming.

EF.Collate is quite similar to EF.StoreType.

Right, but we already have type mapping on SqlExpression, and a whole inference infrastructure for composing expressions with regards to type mapping... But if you think this is simple :smiley:

I am not saying it is simple but we would need to do it regardless ;)

Then possibly related: https://github.com/dotnet/efcore/issues/6717#issuecomment-390110534 :trollface:

Tracking this via #19866 and linked issues.

Reopening to track column collation specification.

Column specs with both collation and generated:

MySQL

Both orderings work:

CREATE TABLE BothWork
(
    name1 VARCHAR(32) COLLATE utf8mb4_german2_ci AS ('hello'),
    name2 VARCHAR(32) AS ('hello') COLLATE utf8mb4_german2_ci
);

Sqlite

Both orderings work:

CREATE TABLE works (
    id INT,
    name1 TEXT COLLATE NOCASE AS ('hello') STORED,
    name2 TEXT AS ('hello') STORED COLLATE NOCASE
);

PostgreSQL

Both orderings work:

CREATE TABLE works
(
    name1 TEXT COLLATE "en-US-x-icu" GENERATED ALWAYS AS ('hello') STORED,
    name2 TEXT GENERATED ALWAYS AS ('hello') STORED COLLATE "en-US-x-icu"
);

SQL Server

... wait for it... wait for it... Only one ordering works (likely because of SQL Server's peculiar syntax around generated columns).

CREATE TABLE works
(
    id INT,
    name AS 'hello' COLLATE German_PhoneBook_CI_AS
);

CREATE TABLE doesntwork
(
    id INT,
    name1 COLLATE German_PhoneBook_CI_AS AS 'hello'
);

@ajcvickers

Can you point to where this was closed before?

https://github.com/dotnet/efcore/issues/2779

We're also lacking this feature for quite some time now and almost ran into some serious security issues due to this. We have columns containing cryptographic hashes and IDs encoded in Base64. Due to the lack of support for column collations by EF we modified affected columns within CodeFirst migrations. At a later point in time we applied some other modification to the column and created another CodeFirst migration, this time without manual modification. As EF didn't know about the collation, it created migration code that didn't consider this aspect and we almost moved back to case insensitive collation, opening vulnerabilities for collision attacks.

I think, having base64-encoded fields or other columns where collation matters, is not so uncommon. Please add this feature!

@minichma thanks for the issue link.

This feature is being added for 5.0 - a PR is already out, so you should be able to try this in an upcoming preview.

Out of sheer curiosity, any specific reason to store BASE64 in a database text field rather than the binary data directly in a binary/varbinary? That would avoid any text-processing issues (such as collation, but there are others), and also be a more efficient.

@roji In some cases we see IDs that happen to be composed from base64 string. But regardless of the way they have been composed, they are just plain strings but in contrast to (e.g.) GUIDs, they require case sensitive treatment. The shape of the IDs often has historic reasons and is not something up to the DB designer.

In other cases we see key material where the standard way of storing it is base64 encoded, like it's the case with PEM-encoded keys.

But even when talking about plain byte arrays, storing them string-encoded often has benefits over storing as varbinary, mainly because on C# side string is easier to handle than byte[]. One aspect is, that comparison of byte[] behaves differently when part of an EF query than when executed locally. While a reference comparison locally, it's a value comparison when run against the DB. That's great on the one hand and the reason for this is obvious, but it also makes code less readable, because the reader always has to keep the semantic differences in mind and consider the context of the read code. When dealing with strings, this aspect is somewhat relaxed, but only if collation is treated appropriately for each field.

Thanks for the added info @minichma, makes sense!

Here's my solution:

public static PropertyBuilder<string> AsVarchar(this PropertyBuilder<string> propertyBuilder, int? maxLength = null, COLLATE? collate = null)
        {
            var size = maxLength?.ToString() ?? "max";
            var collateType = collate.HasValue ? " COLLATE " + collate.ToString() : "";
            var builder = propertyBuilder.HasColumnType($"varchar({size}){collateType}");

            return builder;
        }

The enum type:

public enum COLLATE
    {
        SQL_LATIN1_GENERAL_CP1_CI_AI
    }

Usage example:

builder.Property(x => x.Name).AsVarchar(200, COLLATE.SQL_LATIN1_GENERAL_CP1_CI_AI)

@marmagni in case you're not aware, EF Core 5.0 has 1st-class support for collations, so you don't have to resort to the above; see the docs.

Was this page helpful?
0 / 5 - 0 ratings