Efcore.pg: Implement collation support following upstream EF Core changes

Created on 15 May 2018  路  13Comments  路  Source: npgsql/efcore.pg

https://www.postgresql.org/docs/current/static/collation.html

Relevant upstream changes:

PostgreSQL-specific considerations:

  • Collations can be created and dropped like other database objects - implement support for that too. #1327
  • We already had index collation support. Here we just need to align the API (naming) and use the upstream annotation. ea92b1e517237334e4cd6a2188cf001ee273db57
  • Database-level collation specification is currently very limited: no non-determinstic collations, hard to use ICU, and no support for altering. As a result, we should have an Npgsql-specific feature to define a model-global collation which will be propagated by the provider instead of by the database. In other words, we'd explicitly configure the collation for all columns we create. See #1365.

Some prototyping specifically in Npgsql was done in #1329, but based on that work, collation support will go upstream. ~Waiting for that to be merged and to sync before completing PG-side work.~

enhancement

Most helpful comment

@mhosman yes, I still need to release preview5 of the PG driver, and it will support collations. I've also done some PG-specific work for collations in the driver and will publish some documentation as well. It should all be out in a few days.

All 13 comments

This would be a great feature to have - we use odata on top of efcore - and in the mysql/mssql world we are migrating from, you could count on things like startswith, endswith, contains, etc. being case insensitive. Now that postgresql supports ICU collations natively - having a global ForNpgsqlDefaultCollation("name") and a corresponding per property fluent option would be useful at least to me. If this sounds like your desired approach, i might have some cycles to look into adding it and doing a PR.

@MolallaComm I originally opened this more for culture-sensitive collation, not necessarily for case-sensitivity. If you're looking for a way to do case-insensitive comparisons, you may want to look at citext, which is the more native way to do case-insensitive string comparisons in PG. In fact I've just merged a pack of fixes to make citext work well with StartsWith, EndsWith and the others - this will come out in 3.1 (a couple of days away).

Thanks - citext seems to work well with the little testing I've done after regenerating the DB - just added something like this to my OnModelCreating.

if (Database.ProviderName == "Npgsql.EntityFrameworkCore.PostgreSQL")
{
  foreach (var entity in modelBuilder.Model.GetEntityTypes())
    foreach (var prop in entity.GetDeclaredProperties().Where(p=>p.ClrType == typeof(string)))                      
      prop.AddAnnotation("Relational:ColumnType", "citext");

  modelBuilder.HasPostgresExtension("citext");
}

@MolallaComm good to hear that citext works for you. As I wrote above, 3.1.0 (which I'll probably release tomorrow) should contain fixes for some string matching functions.

Note that you shouldn't need to manipulate the annotations directly in your code above - just use HasColumnType as described in the EF docs.

Note that the Postgres docs states that citext isn't always the best solution:

Consider using nondeterministic collations (see Section 23.2.2.4) instead of this module. They can be used for case-insensitive comparisons, accent-insensitive comparisons, and other combinations, and they handle more Unicode special cases correctly.

@sandord yeah, the recently-added non-deterministic collations do seem to make citext obsolete. I'll try to get around to adding better general support for collation to the EF Core provider.

With #1365, I've done everything I wanted to do for collations in 5.0 - we have the EF Core upstream support, and in addition to index collation support and default column collations (to compensate for poor PostgreSQL support in CREATE DATABASE).

Leaving this open only for a conceptual doc page on collations to explain the PostgreSQL-specific stuff.

This is already in some of the two 5.0 previews to test it?

@mhosman part of it was released in 5.0.0-preview4 (everything but EF.Functions.Collation), with the full support coming in 5.0.0-preview5. See this doc page for the general EF Core parts - there are some PG-specific parts as well but that's not yet documented (will do for preview5).

Great @roji thank you very much for your response! In addition to case unsensitive, this feature will also allow to make "unaccent" queries without having to add "unaccent" in every query?

Yes - that's exactly what the new collation support is supposed to do. For PostgreSQL, read up about non-deterministic ICU collations to understand how that would work.

Hey @roji ! .NET 5 Preview 5 has been released. The preview 5 of this Postgre package will support this? It could be great to have something like UseNpgsql with case sensitive/insensitive and accent sensitive/insensitive.

@mhosman yes, I still need to release preview5 of the PG driver, and it will support collations. I've also done some PG-specific work for collations in the driver and will publish some documentation as well. It should all be out in a few days.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bugproof picture bugproof  路  4Comments

rakeshkotha picture rakeshkotha  路  3Comments

win32nipuh picture win32nipuh  路  4Comments

roji picture roji  路  4Comments

pgrm picture pgrm  路  4Comments