Currently we translate foobar.StartsWith(foo) into:
(foobar like foo +'%' and left(foobar, len(foo)) = foo) or foo = '' -- last term is removed if foo is non-null constant
however @rmacfadyen pointed out that for some scenarios the last term actually makes the query non-sargable.
If the term is removed completely, we return (arguably) incorrect data for case: null.StartsWith("").
We could however use the following translation:
(foobar like foo +'%' and left(foobar, len(foo)) = foo) or (foo = '' and foobar is null)
(and if we know that foobar can't be null we could drop the term as well.
We need to do some more investigation on which scenarios are getting better with this translation (i.e. if its worth complicating the sql)
Triage: Do the second translation.
If the term is removed completely, we return (arguably) incorrect data for case: null.StartsWith("").
In SQL any comparison against NULL is NULL. So a NULL value can never StartsWith(..) anything (empty string, NULL, or an actual value).
In SQL LEFT(NULL, 0) returns NULL (not an empty string). Also SELECT 1 WHERE NULL LIKE '%' returns no rows.
All of which means that currently StartsWith('') is currently returning incorrect results when the column is NULL.
Edit: My thinking on this was from the standpoint of if a database engine where to create a StartsWith(Source, Expression) function how would it treat NULLs? It would return True, False and NULL, with NULL being returned if either the Source or Expression where NULL.
In SQL LEFT(NULL, 0) returns NULL (not an empty string). Also SELECT 1 WHERE NULL LIKE '%' returns no rows.
That is true but another point of view is that the behavior of these functions/operators is simplistic and not consistent with what SQL does with NULL in other places, e.g.:
NULL OR true --> true
NULL AND false --> false
In these cases NULL clearly means an UNKNOWN value. In the case we are discussing, the fact that you don't know what a string value is doesn't change the fact that such string will start with an empty string: all strings do. There are more ways to reason about it as well:
In general translating .NET Boolean expressions that cannot be null to SQL that can be evaluated as NULL is problematic and breaks composability when used as part of a composite predicate, so we try to avoid it within reason. Of course for all other cases in which the translation of StartsWith when one or both of the arguments are null will result in NULL, but this one could be easily avoided, and as we have found so far, with the proposed translation, it does not get in the way of sargability.
We are talking about scenarios in which evaluating StartWith in-memory in a regular program would throw an exception, so we really get to choose how it behaves.
Changing the translation to remove this term could break any application that uses StartWith in a predicate without any additional short-circuiting logic to prevent rows with NULL values from being incorrectly filtered out.
c#
class SqlServerStartsWithOptimizedTranslator
{
//...
new SqlFunctionExpression("LEN", typeof(int), new[] { patternExpression }
//...
}
If patternExpression contains 'char' or 'const string' (ConstantExpression), you may detect length locally and pass it to SQL.
My observation/ 2 cents:
For me:
[Foo] LIKE @foo + '%' AND (LEFT([Foo], LEN(@foo)) = @foo)) OR (@foo = '')
is 100X slower than [Foo] LIKE @foo + '%'
This seems to be because the the query needs to perform a index scan rather than an index seek on IX_TableName_Foo
Note that with #14657, when the pattern is constant we no longer translate the long/inefficient version, but escape the escape characters client-side (if needed) and simply send LIKE %pattern, which is the most efficient possible.
We could still look into improvements for non-constant patterns but I'm not really sure it's worth it.
Currently..
where wo.WorkOrderNumber.EndsWith("D") == false
Translates to..
WHERE (([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
Did I do something wrong, or is that expected until this issue is resolve?
Gets even crazier when two are used together:
where wo.WorkOrderNumber.EndsWith("D") == false
where wo.WorkOrderNumber.EndsWith("T") == false
Translates to..
WHERE ((([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%T')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%T') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END IS NOT NULL) AND ((([t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D')) OR (CAST(0 AS bit) = CAST(1 AS bit))) AND CASE
WHEN [t].[work_order_number] IS NOT NULL AND ([t].[work_order_number] LIKE '%D') THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END IS NOT NULL)
Which is conflicting enough to stop returning records.
I'd expect this to translate to..
WHERE [t].[work_order_number] NOT LIKE '%T' AND WHERE [t].[work_order_number] NOT LIKE '%D'
Seems like StartsWith() == false and EndsWith() == false could use some work as well.
I'm being punished for writing more legible code. 馃槀
where !wo.WorkOrderNumber.EndsWith("D")
where !wo.WorkOrderNumber.EndsWith("T")
Translates to..
WHERE ([t].[work_order_number] IS NOT NULL AND NOT ([t].[work_order_number] LIKE '%D')) AND ([t].[work_order_number] IS NOT NULL AND NOT ([t].[work_order_number] LIKE '%T'))
Is there going to be a fix for that? Not using LIKE for StartsWith is a killer for optimization. It seems to me like we really need to be able to use the index in large tables.
Hi,
I am encountering major performance issues as well using StartsWith().
It seems like there must be a better solution to be able to get the query translated using the "LIKE" method. I am running on big tables with a lot of data.
I am not able to use the EF.Functions.Like solution because I am working with Generic Repository Pattern. Is there a workaround for this?
@langdonx please try the latest 5.0.0-rc2 - for constant patterns (e.g. EndsWith("D")) we translate to a simple and efficient LIKE %D; same with StartsWith.
Non-constant patterns are trickier, though we have some plans for how to improve that as well (client-side parameter transformation).
@roji Thank you for the quick response!
We actually need this for non-constant array parameters(joined with a query)
Is there any workaround replacing the current StartsWith implementation with a custom one that uses Like? Till there is a fix? I mean overriding the current implementation.
The current implementation is a killer for us with a non usable product.
Using LIKE takes 30ms on a large table (since it uses the index).
Using current implementation take around 3000ms with arround 3000ms cpu time.
Thanks!
You can always explicitly use LIKE by using EF.Functions.Like instead of StartsWith/EndsWith. But be careful - if your the pattern your matching contains special characters (%, _), you'll get incorrect results (that's why we don't currently translate non-constant StartsWith with LIKE).
Most helpful comment
My observation/ 2 cents:
For me:
[Foo] LIKE @foo + '%' AND (LEFT([Foo], LEN(@foo)) = @foo)) OR (@foo = '')is 100X slower than
[Foo] LIKE @foo + '%'This seems to be because the the query needs to perform a index scan rather than an index seek on
IX_TableName_Foo