Efcore: Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison

Created on 4 Dec 2014  ·  86Comments  ·  Source: dotnet/efcore

Currently we support the == operator and the String.Equals(String) methods which both have the same behavior in-memory as String.Equals(String, StringComparison.Ordinal). We could also support the latter without making significant changes and possibility other values of StringComparsion such as OrdinalIgnoreCase by applying LOWER() on both operands.

area-query needs-design type-enhancement

Most helpful comment

@xperiandri Still does not work in 3.0
I was shocked that it doesn't! 😂😂

All 86 comments

Actually I just got a warning about String.Equals (without the third parameter) not being able to be translated and being evaluated locally ...

After upgrade to 1.1.0, I use String.Equals(String, StringComparison.Ordinal) got this error :

InvalidOperationException: variable '__name_0' of type 'System.String' referenced from scope '', but it is not defined

@Jeffiy can you please create a new issue and provide a repro? If this is a regression in 1.1 we may want to fix it in 1.1.1.

Could this be implemented using Collation on SQL Server? Using LOWER on indexed columns throws the index away for querying, as far as I know, which could hurt performance for people using this.

We wouldn't know this is happening since this would be translated by EF internally and would get us by surprise. Since I would have no warning about "local evaluation" on this case, I would expect that my ORM is doing the right thing and not creating a performance problem. Doing a LOWER should be an explicit decision, in my opinion.

@divega Please, it is a must have.

Still does not work in 2.0
I was shocked that it doesn't!

I used string.Equals(string, string)

Any update on this?

Migrating to EFCore is becoming a show stopper for us because this issue. Any update @divega @rowanmiller ?

@mgolois This issue is about String.Equals(String, StringComparison). could you please elaborate about what exactly the query looks like that you are trying to port to EF Core?

@mgolois to be more precise:

  1. EF6 completely ignores the StringComparison argument. Is this the behavior that you expect from EF Core? If yes, why?
  2. Also, are you aware that ==, string.Equals(String), and String.Equals(string, string) work? If yes, why is lack of support for String.Equals(String, StringComparison) blocking you?

Does equals usage with StringComparison at least produces a warning?

@xperiandri - It does evaluation on client to produce correct results.

@xperiandri and it produces a warning like this in the log:

warn: Microsoft.EntityFrameworkCore.Query[20500]
      => Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor
      The LINQ expression 'where [p].Name.Equals(__name_0, CurrentCulture)' could not be translated and will be evaluated locally.

Which can optionally be turned into an error as explained in the documentation.

Maybe you would add Roslyn analyzer that will display info about that in design time?
Because having this warning in runtime is not an excellent experience.

Which can optionally be turned into an error as explained in the documentation.

It is very good!

so what exactly happens right now if I use Equals with OrdinalIgnoreCase ?
does it perform an Equals on the db and then evaluate the ignore case locally or does it gets all records of the db and do the filtering locally ?

Is there any plan to do this in the near future? I wasn't aware of the fact that it's not supported, saw the warning when I already used this pattern in several queries. Do you recommend changing it to Where(x => x.SomeProperty.ToLower() == parameter.ToLower()) - is string.ToLower translated correctly?

@divega - This issue suggest that for OrdinalIgnoreCase we could apply Lower on both sides, but that would mean index cannot be used (as pointed out by @akamud ), but there was no definitive answer which direction we would be taking. Can you clarify?

@smitpatel let’s bring this to a design meeting. I would be ok with the overload with StringComparison started to throw in the default implementation in absence of client eval, but I would also be open to grandfathering it (via ignoring the StringComparison argument) if the general feeling is that it is preferable to mitigate the impact of the breaking change.

I think it is preferable not to do the ToLower trick behind the users’ backs. They can still do that themselves explicitly, and there is still a chance that some of them will miss the fact that it isn’t sargable, but at least we didn’t do it without them asking.

@smitpatel Besides, I believe many databases will default to case insensitive comparisons anyway, so what is harder to achieve is a case sensitive one.

@BalassaMarton if you control the database schema it is preferable to specify a column collation that will compare string according to your preference.

If you have a case insensitive collation and can’t change it but need a comparison that is case sensitive, you can do a regular (==) comparison in the database and then filter out false positives on the client with a case sensitive comparison.

I have tried other workarounds using functions that encapsulate comparisons with explicit collations, but SQL Server actually doesn’t support inline scalar functions, and non-inline functions aren’t sargable, so you end up needing to write a TVF. It is possible but complicated.

@divega I didn't even think about setting collation, but it could solve my short-term problems, thanks for the tip.

I don't think anyone pointed it out in this issue, but the main reason this is important in my use case is because string equality is [usually] case-insensitive when executed by the database, but when running in unit tests it will not be. Unless there's some configuration for EF Core that intercepts string equality comparisons and makes them case-insensitive then it's necessary for this to be supported in the query abstraction.

@smarts - This feature is to influence case sensitivity on server side. It is not for testing. There are always going to be differences between how database works & how linq works. If you want to unit test same query as database, then you need to write unit test which accounts for difference. Or test using a testing database rather than linq.

@smitpatel You're assuming the data in question is returned from the database, rather than used in the query to compute some other value, or simply used in code that is unavailable to the test.

Unless there's some configuration for EF Core that intercepts string equality comparisons and makes them case-insensitive

Why do you need to intercept anything and change case sensitivity. Let server handle that. The different arises only when you are computing same equality condition at different places and simple solution is account for it.
You really need to give more details why any such interception mechanism should be part of EF query abstraction?

@smitpatel I don't think you understand, so here's a concrete example:

from f in Foos
select new
{
  Status = String.Equals(f.Bar, "bad", StringComparison.OrdinalIgnoreCase) ?
    Status.NotGood :
    ((String.Equals(f.Bar, "ok", StringComparison.OrdinalIgnoreCase) ?
      Status.Ok :
      Status.Good) :
    Status.Unknown)
};

Status is an enum
In case it is not clear, I want the equality check to happen server side. Some data in the database for the Bar property/column is UPPERCASE & some is LOWERCASE, so data in the test also should be setup as such. As written it will execute client side, which is undesirable. Changing to use the == operator will work server side, but will cause the test to fail because of the difference in behavior of == for in-memory execution. I understand that for you the main purpose of the feature is to control server side behavior, but it also serves to provide parity between server side (IQueryable<T>) and client side (IEnumerable<T>) execution, which is also very important and one of the things that made EF so good.

I am using following for years now (since EF4 or so) and now parting to EF Core

query.Where(n => filter.Name.Equals(n.Name, StringComparison.InvariantCultureIgnoreCase))
````
Of course it does not work anymore in EF Core.

What i can remember is that the query got resolved depends on the used db-provider. 
if the server is case-insensitive, it resolved server-side
if the server is case-sensitive, it resolved client-side

A .net case-sensitive String.Equals in the LINQ query was a deterministic command. 

query.Where(n => filter.Name.Equals(n.Name))
````
This query was server-side if the server was case-sensitive
and ALSO client-side (extra filter) if the server was case-insensitve.

I don't know when and why this behavior got removed for EF.
But it should be deterministic, the same linq-filter should
produce the same result executed on all db-providers
or throw an exception.

I don't think it worked as you think. If that were the case, depending on the dbprovider you will retrieve the whole table causing potentially bad performance on huge tables. Not something I personally want to deal with. I believe on 3.0, this client side stuff will throw and I'm happy with it.

@Zetanova I don't remember ever having that behavior in any version of EF. As @Bartmax, client evaluation is not going to be the way we handle this going forward.

@ajcvickers My point was not "client and/or server side"
I wanted to say that it should be deterministic.

Here some old post from 2009
http://justgeeks.blogspot.com/2009/05/making-linq-to-entities-do-case.html

Because i know that i am using CI comparison server side,
i just switched to the single paramter String.Equals method.

Hi, could someone please provide an update? I can't tell whether the proposed functionality is being considered or not…

@smarts This issue is in the Backlog milestone. This means that it is not going to happen for the 3.0 release. We will re-assess the backlog following the 3.0 release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

Ok. I guess as long as the Z.EntityFramework.Classic NuGet package exists there's a viable alternative anyway 🤷‍♂

@xperiandri Still does not work in 3.0
I was shocked that it doesn't! 😂😂

I don't understand the "failure mode" of this in 3.0.
It used to be that it just evaluated this locally, and performance was the issue.
But NOW, for me at least, it does something that I find very detrimental.

With an entity called Currency with a string field called AlphaCode and a record where AlphaCode is "EUR".

If I do the following:

.SingleOrDefault(a =>
                    a.AlphaCode.Equals("EUR", StringComparison.InvariantCultureIgnoreCase)
                );

It just returns null (no error or anything like that)!

If I do:

.SingleOrDefault(a =>
                    a.AlphaCode.ToUpper() == "EUR".ToUpper()
                );

I get the expected entity

And it's doing, what I would consider, strange things on the Database:

With Equals

info: Microsoft.EntityFrameworkCore.Database.Command[20100]
      Executing DbCommand [Parameters=[], CommandType='Text', CommandTimeout='30']
      SELECT TOP(2) [c].[CurrencyId], [c].[AlphaCode], [c].[Code], [c].[Description], [c].[SignificantDecimals]
      FROM [Currencies] AS [c]
      WHERE CAST(0 AS bit) = CAST(1 AS bit)

That purposeful 0=1 seems peculiar; why make the round-trip to the database for that? And if it is known that this can never return anything useful why not emit an error?

Without Equals

info: Microsoft.EntityFrameworkCore.Database.Command[20100]
      Executing DbCommand [Parameters=[@__ToUpper_0='EUR' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SELECT TOP(2) [c].[CurrencyId], [c].[AlphaCode], [c].[Code], [c].[Description], [c].[SignificantDecimals]
      FROM [Currencies] AS [c]
      WHERE (UPPER([c].[AlphaCode]) = @__ToUpper_0) AND @__ToUpper_0 IS NOT NULL

I would really have liked some kind of error in the Equals case.
And I would really have like for the ToUpper to have been translated to UPPER on both sides of the = on the database.

@AllTaken the issue you're describing with Equals is a bug, filed separately as #18472. Right now the expected behavior would be to throw, since we don't translate that overload of string.Equals (this is what this issue is about). Use the == operator with ToUpper as needed to get the SQL you want.

Note that writing "EUR".ToUpper() isn't necessary since EUR is already an uppercase constant.

@roji OK. Great to see that a fix has already been merged.

Note that the call to TopUpper() on a const was merely for illustrative purposes, this is of course in reality a variable.

msedge_2019-11-24_01-54-27

I think there must be a configurable collation (fluent and/or attribute) that makes it sure the collation is whatever case you want to be.
Executing raw sql commands at migrations doesn't seem like good idea.
Ef can see changes in collation an make the need for a new migration and everybody's happy.

Something like:

.SetCollation(Collations.SQL_Latin1_General_CP1_CI_AS)

or

SetCollation(charset, codepage, case, accent)

I think there must be a configurable collation (fluent and/or attribute) that makes it sure the collation is whatever case you want to be.
Executing raw sql commands at migrations doesn't seem like good idea.
Ef can see changes in collation an make the need for a new migration and everybody's happy.
Something like:
.SetCollation(Collations.SQL_Latin1_General_CP1_CI_AS)

or
SetCollation(charset, codepage, case, accent)

I didn't know about Collation until I asked on stackoverflow.
Yes it would be great to see something like SetCollation to make it friendly for the ones like me that never used this until 3 weeks ago.

@wstaelens you really don't want this - your query won't be able to use any indexes that cover that column. Collations are nothing new or unknown either. They affect comparisons and hence, indexing. If you try to use a different collation, the server won't be able to use any existing indexes, scanning all the data instead.

Besides, if the database designer specified that A1 and a1 are different values, why does the client try to treat them as if they were the same? Either the designer got it wrong, or the client got it wrong. In a Name, there's probably no reason for case-sensitive matching. In a column holding Base64 text, casing matters

@pkanavos he said during a migration, he IS designing the database...

This needs to be implemented, I am using .Equals("test", StringComparison.InvariantCultureIgnoreCase) and it doesn't work anymore. It's putting a wrench porting my EF Framework app is.

Please prioritize this and get it put in ASAP.

@upsilondynamics putting a wrench migrating? This wasn't working in ef6 too.. it was compared locally... Put a ToArray() before the comparison and you get the same behaviour... At least this is what I remember... Not in front of a pc right now...

x => x.Foo.Equals("bar", StringComparison.OrdinalIgnoreCase)
or maybe x => StringComparer.OrdinalIgnoreCase.Equals(x.Foo, "bar")
was working in some version of entity framework prior to EF Core, because it was while converting to EF Core that this problem manifested for me

@upsilondynamics putting a wrench migrating? This wasn't working in ef6 too.. it was compared locally... Put a ToArray() before the comparison and you get the same behaviour... At least this is what I remember... Not in front of a pc right now...

If it was working in EF6 then why did my LINQ queries work before I went to EF CORE? anyway I got around the issue by simple doing a toLower() and == now seems to work without issue.

@dotnet/efcore - We should review if EF6 did any of these translation to server and discuss if we should do it. And then close the issue. Ignoring argument is just a slippery slope which will always be wrong in some cases.

Using Lower create index miss so let users do that explicitly.
Collations... let's not do this!

@smitpatel Agree we should review EF6. Agree we should probably not do automatic changes collations.

However, I've been considering that if the behavior matches the default behavior (or at least case sensitivity) of the database (or possibly even the collation configured in the model and used by Migrations, if such a thing is created) then we allow the translation to proceed.

We throw if you specify a kind of comparison that would require the query to do something different than is the default for the column.

This doesn't add any new functionality, but allows the C# code to express an intention that the database then must conform to.

Default behavior of database can be changed by collation. A database can be created with either sensitivity collation. So in order to do it accurately, it has to be configured in EF Model. If we add model/migration support for collations, then we can certainly pass through methods which does the same thing. It should be tracked by collation issue.

Reading up, it seems like the expectations of users is more for EF Core to add LOWER() or whatever is necessary to make the query work. I agree with @smitpatel that EF Core should not do this - we should not be implicitly doing things that prevent index usage (big pit of failure), and users can easily opt into this by explicitly adding ToLower themselves (i.e. we add very little value).

Adding support for the comparison type that happens to correspond to the database behavior would have the advantage of failing the query if that type would not be the right one (e.g. if a user accidentally says they want case-insensitive but the column is configured otherwise). But that presupposes that the user actually specify that enum value, with no other benefit than that verification and exception... I can see value there but it doesn't seem huge...

Agree with both of you. It may not add enough value, and it can be tracked as part of supporting collations in Migrations. I just don't want us to completely dismiss the idea of allowing people to write code like this in their queries.

Those arguments may be valid for the IgnoreCase comparisons but not for the rest. In MSSQL (and most other RDBMS except for postgres AFAIK), the default collation is case insensitive, you must collate if you want compare case sensitively. This operation can even use an index if you're clever, e.g. you compare both CI and then CS, it will use the index for the CI check which should be selective enough and then evaluate the CS check as predicate.

SELECT *
FROM Table
WHERE Col = @Value
AND Col = @Value COLLATE Latin1_General_CS_AS 

@Suchiman FYI PostgreSQL and Sqlite both do case-sensitive comparisons by default (but MSSQL and MySQL are indeed case-insensitive).

Interesting, I can see how what you say could make sense (using a CI index for a CS lookup, and then applying CS predicates on the results). However, I can't see this actually happening on MSSQL (but I'm a bit of an MSSQL newb). Here's the code I'm using:

-- Create the schema
CREATE TABLE data (id INT, name NVARCHAR(256));
CREATE INDEX IX_foo ON data(name);

-- Insert 100000 rows - having lots of data is important as it can affect planner decisions on index usage etc., at least for some databases
BEGIN TRANSACTION;

DECLARE @i INT = 0;
WHILE @i < 50000
    BEGIN
        INSERT INTO data (name) VALUES ('hello' + CAST(@i AS NVARCHAR(256)));
        INSERT INTO data (name) VALUES ('HeLLo' + CAST(@i AS NVARCHAR(256)));
        SET @i = @i + 1;
    END;

COMMIT;

SET SHOWPLAN_ALL ON;

SELECT * FROM data WHERE name = 'hello10000';  -- Does an Index Seek, TotalSubtreeCost=0.008294476
SELECT * FROM data WHERE name = 'hello10000' COLLATE Latin1_General_CS_AS; -- Does a Table Scan, TotalSubtreeCost=0.59920794

SET SHOWPLAN_ALL OFF;

Is there something else I need to do to make this work?

@roji yes, you need to change the second select to do both checks 😉

SELECT * FROM data WHERE name = 'hello10000' and name = 'hello10000' COLLATE Latin1_General_CS_AS;

MSSQL isn't smart enough to do that trick automatically.

@Suchiman ah yes, I missed that part in your original sample - thanks! I can see the index being utilized now. Interesting, at least in theory SQL doesn't guarantee left-to-right evaluation and short circuiting (e.g. link) but in this case that seems to be working well.

So there seems to be value in providing a translation for StringComparison.Ordinal on MSSQL specifically, producing the above SQL. As @smitpatel this would depend on collation support - #8813 seems to be the appropriate issue for tracking that (added a comment there).

It isn't reliant on left-to-right evaluation, e.g.

SELECT * FROM data WHERE name = 'hello10000' COLLATE Latin1_General_CS_AS and name = 'hello10000';

works just as well, but since the equality predicate is very selective according to table stats, it's an obvious choice for MSSQL to utilize that index. Since there's no index seek opportunity for the collated comparison, that needs to be evaluated as a predicate, in which case, there are less rows to evaluate after the CI index seek than when using an index scan or table scan.

Good point @Suchiman, that makes sense and is a good example of non-left-to-right evaluation in SQL (and how it leads to better perf).

FWIW PG 12 introduced "non-determinstic" collations, which allow case-insensitivity - so this may actually be relevant there. The traditional pre-PG12 way to do case-insensitivity in PG is CITEXT, which AFAIK doesn't allow case-sensitive comparisons (but more research may be required).

One last note: a potential difficulty is exactly which case-insensitive collation EF would generate here (i.e. why Latin1_General_CS_AS rather than something else). That might be a good reason to leave it to the user to construct this themselves and specify the collation they want.

Indeed, in my case, it works because Latin1_General_CI_AS is the default collation and Latin1_General_CS_AS the case sensitive counterpart.
Unless EF would allow configuring the default CS and default CI collations, accidentally picking the wrong collation here would lead to incorrect results.

@smarts

x => x.Foo.Equals("bar", StringComparison.OrdinalIgnoreCase)
or maybe x => StringComparer.OrdinalIgnoreCase.Equals(x.Foo, "bar")
was working in some version of entity framework prior to EF Core, because it was while converting to EF Core that this problem manifested for me

Yeah, the part you are missing is: you where actually downloading the whole table from the db, deserialize it into entities and THEN comparing it, so the EFCore team made you a favour, and translated that to an actual error because that is what it is: one big nasty error of yours, you just gonna notice later, when you are in production and it's late.

@Suchiman @roji Great conversation!

Also see high-level collations issue #19866

@TrabacchinLuigi you are just plain wrong…
Given the following code:

using (var db = new TestDbContext(connectionString))
{
    var f = db.TestItems.AsNoTracking().FirstOrDefault(x => x.Status.Equals("blah", StringComparison.OrdinalIgnoreCase));
}

the generated SQL is:

SELECT TOP (1) 
    [Extent1].[my_column_id] AS [my_column_id], 
    [Extent1].[my_other_column] AS [my_other_column]
    FROM [dbo].[my_table] AS [Extent1]
    WHERE N'blah' = [Extent1].[my_other_column]

@smarts which exact EF are you using? On 2.2 that query is client-evaluated (on 3.x it triggers translation failure).

Yeah sure @smarts i'm so plain wrong let's do another test:

using (var db = new TestDbContext())
{
    var query = db.TestItems.AsNoTracking().Where(x => x.Status.Equals("blah", StringComparison.Ordinal));
    var f = query.FirstOrDefault();
}

f is {Id: 1, Name: "BlAh"}

Worked like a charm!

P.s. @roji looks like at some point ef6 team decided to just ignore the case option to avoid downloading the whole table or everything that matched the case to compare it later.

@smarts I missed the [Extent1] in your SQL - that indeed points towards EF6 rather than EF Core. In any case we'd definitely rather throw an exception than translate incorrectly, as seems to be the case here.

@TrabacchinLuigi I think you are missing my point. I'm not (and never was) describing how EF Core works. I'm aware that if using EF Core with the code snippet I referenced then the case-insensitive property comparison would be evaluated on the client side (i.e., not in SQL).
What I and others in this thread are arguing for is that with EF (*not EF Core) the code did not evaluate client side. It evaluated on the server. So, we want EF Core to be able to work the same way, translating the case-insensitive string equality expression into SQL.
Does that make sense?
Thanks @roji. Yes, this was using EF6 (whatever the latest version is), not EF Core

@smarts

What I and others in this thread are arguing for is that with EF (*not EF Core) the code did not evaluate client side. It evaluated on the server. So, we want EF Core to be able to work the same way, translating the case-insensitive string equality expression into SQL.
Does that make sense?

As a general rule, with EF Core we translate something when we can be reasonably sure that it will run on the server on the same way that it runs on the client (i.e. same results). Databases vary a great deal with regards to case-sensitivity, and even with the same database collation can be defined on a per-column basis, affecting comparison behavior. So if we translated comparisons with StringComparison.Ordinal to a simple equality in SQL (which is what you seem to be requesting), we'd return incorrect results if the column in question happens to use a case-insensitive collation (which happens to be the SQL Server default).

We also want to avoid doing anything implicitly that will prevent the database from using indexes - this is what would happen if we added LOWER on both sides of the equality, or various other solution. Users are free to do this themselves, but implicitly killing performance this way without an explicit opt-in seems like a bad idea.

I hope the above makes it clear why we can't just translate the various StringComparison options. If you still think you want a translation here, think about exactly what SQL you'd like to see, whether it would always be correct (across column collation configurations!), and whether it would be efficient (i.e. allow index use).

@smarts again... It does not work on ef6 too, showed you the result the other day, that was on entity Framework 6 not core. Also, if you had read this thread there is plenty of reason why efcore should not mimic what ef6 does, first of all because it's database agnostic.

@roji I just want something that _can_ work with the case-insensitivity checks rather than throwing an exception or trying to evaluate on the client, so I'm definitely not against an opt-in behavior rather than it being the default/implicit.
I'm not familiar w/ EF Core internals so forgive me if I use the wrong terminology, but could there be some [extension] method on the DbContext options class (or options builder?) that enables an expression visitor -- which knows about some of the various string equality expressions -- and can translate them as it would a normal x.Foo == "bar" expression?

I'm preparing for my 70-487 exam in a couple of weeks, so to make sure that I'm fully up to speed, I decided to finally take some time to learn EF. Of course, I find it best to use the latest and greatest, so opted for EF Core, especially as I'm planning on running the code on a Raspberry PI at some point.

I've just hit the same issue as described above, using EF Core 3.1.2. I think this can be solve quite easily, by allowing things to work something like this....

  1. use the == or string.Equals(a,b) syntax to do things how the database naturally wants to work (the current implementation)

SQL Instances can have a defect collation set at install time, so the collation cannot be guaranteed without knowing in advance what the sensitivity for that database instance is. In addition, individual columns can have their collation set when the creating a table, so again, the behavior isn't guaranteed.

That said, executing a query like SELECT * FROM MyTable WHERE MyColumn = 'Hello' will give consistent results on a given database, in that the behavour is defined by the underlying metadata for that specific instance.
i.e. a case sensitive column will match case sensitively, and vice versa.

  1. Now consider the more explicit example of string.Equals( a, b, StringComparison.OrdinalIgnoreCase ) as an example.
    now, I'm specifying that it's case insensitive, so the query can easily be modified to include the collation in the where clause, which defines the behavior explicitly i.e. SELECT * FROM MyTable WHERE MyColumn = 'Hello' COLLATE Latin1_General_CI_AS

  2. in the case of StringComparison.Ordinal a case sensetive collation can be applied instead

SELECT * FROM MyTable WHERE MyColumn = 'Hello' COLLATE Latin1_General_CS_AS

The exact form of the SQL will depend on the RDBMS in use as the syntax does vary between systems.

I do feel that this implementation will provide a good enhancement to something that I feel is rather hamstrung at the moment.
p.s. I've been working with databases from multiple vendors for the over 25 years and have resisted switching to EF for reasons of missing features like this. I can see why EF could be a fantastic way of developing database agnostic software, however omissions like not being able to perform an explicit case sensitive lookup leave me once again wondering if giving up control of the query language is a good idea afterall.

@smarts

I just want something that can work with the case-insensitivity checks rather than throwing an exception or trying to evaluate on the client

I may be missing something, but if you know your column is case-insensitive, than you can just use the regular C# equality operator, EF Core will generate a regular SQL equality operator, and everything will just work... If I'm misunderstanding you, can you please clearly state what C# you want to be translated into what SQL?

I'm not familiar w/ EF Core internals so forgive me if I use the wrong terminology, but could there be some [extension] method on the DbContext options class (or options builder?) that enables an expression visitor -- which knows about some of the various string equality expressions -- and can translate them as it would a normal x.Foo == "bar" expression?

Here you seem to be asking for string.Equals with all different StringComparison values to be translated to a simple equality expression in the database. That means that if I specify StringComparison.Ordinal on SQL Server (which is case-insensitive by default) or StringComparison.OrdinalIgnoresCase on PostgreSQL (which is case-sensitive by default), I get very unexpected (read: wrong) results...

@colindawson this was already proposed by @Suchiman above (https://github.com/dotnet/efcore/issues/1222#issuecomment-582786868). The main difficulty here would be knowing which case-sensitive (or insensitive) collation to use, as I wrote here: https://github.com/dotnet/efcore/issues/1222#issuecomment-582883721.

After further discussion in https://github.com/npgsql/efcore.pg/pull/1329, we've decided not to implement this in 5.0 - but may revisit later based on request. We are already going to bring some substantial collation support in 5.0 (#6577, #19275, #8813), and we believe most needs would be addressed by those.

Reasons why this feature is problematic:

  • We'd need to know which case-insensitive/sensitive collation to use, there are many.
  • We could allow users to define in advance which collation to use when an explicitly case-insensitive/sensitive operation is used. Allowing this on the column raises various issues (see https://github.com/npgsql/efcore.pg/pull/1329#pullrequestreview-387064323) and it doesn't seem possible to do something both accurate and not too restricted.
  • Allowing the user to do define collations for explicitly case-insensitive/sensitive operations only at the global database level. This doesn't have the issues of column definitions.
  • However, we think that database/column collations and EF.Functions.Collate() should address the needs of the vast majority of people. We can revisit this in the future if those are insufficient.

In addition, as discussed, we should identify these untranslatable overloads (with StringComparison), and fail with guidance on how to do, linking to docs (https://github.com/dotnet/EntityFramework.Docs/issues/2273).

@roji

I may be missing something, but if you know your column is case-insensitive, than you can just use the regular C# equality operator, EF Core will generate a regular SQL equality operator, and everything will just work... If I'm misunderstanding you, can you please clearly state what C# you want to be translated into what SQL?

Yes, but I also want this to work in unit test scenarios that mimic the real data (i.e., use the same query w/ an in-memory IQueryable<T> instance where the data in the in-memory collection might have some lowercase, some mixed case, some uppercase). Being able to specify the expression as a case-insensitive comparison enables this scenario, but we still don't want the entire query to be evaluated client-side, so we need support for the aforementioned expression at the query translation level).

Here you seem to be asking for string.Equals with all different StringComparison values to be translated to a simple equality expression in the database. That means that if I specify StringComparison.Ordinal on SQL Server (which is case-insensitive by default) or StringComparison.OrdinalIgnoresCase on PostgreSQL (which is case-sensitive by default), I get very unexpected (read: wrong) results...

Regarding your first statement: yes, you are understanding my ask correctly. You would get different results based on the underlying SQL engine, and I agree this results in unexpected results. However, I think this can be mitigated with naming & documentation, and also by way of being an opt-in feature.

Even if this isn't making the cut for something that goes into the library, would it be possible to receive some guidance on how to create such a feature? This is one of the last things for us that is preventing a switch from the EF to EF Core.

@smarts we definitely won't be translating StringComparison.Ordinal to a simple SQL equality operator, since that would return wrong results on case-insensitive databases. If a user specifies Ordinal comparison, they need to get back results for ordinal (case-sensitive) comparisons; I definitely don't think it makes sense to compromise on that because of a testing need.

First, I'd suggest considering switching SQL Server to be case-sensitive by specifying a case-sensitive database collation (see #6577). Assuming this fits your actual production needs, it seems like it would be the ideal solution: have both the database and InMemory in case-sensitive mode, and use the simple equality operator which would behave the same way with regards to case-sensitivity.

Second, this is another way in which InMemory is not really suitable for testing apps that use EF Core - this is something we discourage - there are many ways in which InMemory behavior simply does not match that of a real database (e.g. lack of transactions). We generally recommend testing against your real production database, and have recently docs on how to do that efficiently, so that tests don't take too long. FWIW EF Core itself has a huge number of tests which execute against the actual database - this is reasonably efficient. Another option is to use Sqlite.

Finally, if you insist on using InMemory and do not want to change your SQL Server collation, you could try to make InMemory case-insensitive by adding a query preprocessing visitor which replaces simple string equality with StringComparison.OrdinalsIgnoreCase on InMemory. To do this completely, it needs to be done for all string operations, not just equality (e.g. StartsWith, Contains...).

As for the first option: No, that's not desirable. I want the test side to act like SQL Server, not the other way around.
So, yes, the 3rd option is the one I'd want, and that's what I was asking for guidance on implementing. You referred to "preprocessing visitors". @roji are there any existing classes/examples to which you can point me? Also, you specifically said "make InMemory case-insensitive". Are you saying that implementing a preprocessing visitor would only affect the InMemory database? Can you please elaborate a bit on this?

could you two go chat in a private room ? exchange your phone numbers and call each other, this thread isn't about what smarts want and can't achieve.

@smarts you basically need to replace the preprocessor factory service, and add your own expression visitor which locates strings comparisons and replaces them. For example, you can use this to convert string comparisons with InMemory string.Compare(a, b, StringComparison.OrdinalsIgnoreCase). There are various resources out there on how to write expression visitors in .NET, it shouldn't be too hard.

But I'll say again that I think this isn't a good approach in the long term - you're trying to twist InMemory to behave like your real database, which it will never fully do.


Wiring up a custom visitor at the beginning of preprocessing

```c#
public class BlogContext : DbContext
{
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(...)
.ReplaceService();
}

public class MyQueryTranslationPreprocessorFactory : IQueryTranslationPreprocessorFactory
{
private readonly QueryTranslationPreprocessorDependencies _dependencies;
private readonly RelationalQueryTranslationPreprocessorDependencies _relationalDependencies;

public MyQueryTranslationPreprocessorFactory(
    [NotNull] QueryTranslationPreprocessorDependencies dependencies,
    [NotNull] RelationalQueryTranslationPreprocessorDependencies relationalDependencies)
{
    _dependencies = dependencies;
    _relationalDependencies = relationalDependencies;
}

public virtual QueryTranslationPreprocessor Create(QueryCompilationContext queryCompilationContext)
    => new MyQueryTranslationPreprocessor(_dependencies, _relationalDependencies, queryCompilationContext);

}

public class MyQueryTranslationPreprocessor : RelationalQueryTranslationPreprocessor
{
public MyQueryTranslationPreprocessor(QueryTranslationPreprocessorDependencies dependencies, RelationalQueryTranslationPreprocessorDependencies relationalDependencies, QueryCompilationContext queryCompilationContext)
: base(dependencies, relationalDependencies, queryCompilationContext)
{
}

public override Expression Process([NotNull] Expression query)
{
    query = new SomeQueryTransformingExpressionVisitor().Visit(query);  // You need to implement this visitor
    return base.Process(query);
}

}
```

@roji I totally understand your point. Changing the DB collation is not going to happen, but migrating to SQLite might be possible as a long term solution. In the meantime, thank you so much for your guidance on this [hopefully] short-term solution.

I have just found out that

await _database.Set<MyEntity>().AsNoTracking().Where(i => i.RegCis == someString).AnyAsync()
behaves as expected, EF Core logs:

SELECT CASE
    WHEN EXISTS (
        SELECT 1
        FROM [MyEntity] AS [i]
        WHERE [i].[RegCis] = @__someString)
    THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT)
END 

But folowing:
await _database.Set<MyEntity>().AsNoTracking().Where(i => string.Equals(i.RegCis, someString)).AnyAsync()
surprisingly generates foillowing log!
SELECT [i].[RegCis] FROM [MyEntity] AS [i]
And also, execution takes very long time (300 times longer than the correct query), so the SQL is definitely incorrect !

my csproj:

    <PackageReference Include="Microsoft.AspNetCore" Version="2.0.2" />
    <PackageReference Include="Microsoft.AspNetCore.Identity.EntityFrameworkCore" Version="2.0.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="2.0.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Tools" Version="2.0.2" />

@vihorlat EF Core 2.0 is out-of-support. Please update to EF Core 3.1 and file a new issue if it still fails attaching a small, runnable project or posting a small, runnable code listing that reproduces what you are seeing so that we can investigate.

Was this page helpful?
0 / 5 - 0 ratings