Efcore: Questions about parameterization for expression constant values

Created on 7 Sep 2020  路  5Comments  路  Source: dotnet/efcore

Hi,

For context: We have an API which receives OData-like query strings as filters on our results. We translate these into expression trees and pass them to EF core as .Where(expr) LINQ statements.
In these expression trees the filter values are stored as Expression.Constant values.

This results in perfectly valid SQL translations, but something about it recently caught my eye:
I noticed that the filter values are passed directly into the SQL instead of being parameterized. I realise this topic has been discussed before in several threads, but I couldn't find an answer for my questions in them despite quite a bit of searching.

  • Is there potentially a risk for strings being used this way, i.e. a risk for potential SQL injection? Is there a risk of the input "breaking out" of the SQL string? (I'm not an expert in this area, so I don't know if this risk also exists when the string is parameterized or if parameterization negates/mitigates the risk)
  • In a production environment, will this (potentially) negatively impact SQL query plan caching and is this something that can be avoided? (I would like to avoid frustrating our DBAs ;))
  • Will internal EF core query plan caching work as expected or will every query with different constants be seen as a new query to cache?

We can add an expression visitor to visit constants and create temporary variables to use in the expression trees instead, but would prefer to avoid the overhead if no risks or negative impact are foreseen.
Would love to hear your advice.

Steps to reproduce

Using an expression such as:

EntityQueryable<Foo>
    .Where(x => x.Id == id)
    .Where(
        foo => ((foo.Bar != null) && (foo.Bar.Name == "a")) || 
        ((foo.Bar != null) && (foo.Bar.Name == "b")))

results in the following SQL:

exec sp_executesql
N'SELECT COUNT(*)
FROM [dbo].[Foo] AS [f]
    INNER JOIN [dbo].[Bar] AS [b] ON [f].[Id] = [b].[FooId]
WHERE ([f].[Id] = @__id_0) AND 
    (([f].[Name] = N''a'') OR ([f].[Name] = N''b''))',
N'@__id_0 uniqueidentifier,
@__id_0='0760397D-DBE9-43B9-A27A-AEBCB991201D'

whereas using expressions such as:

EntityQueryable<Foo>
    .Where(x => x.Id == id)
    .Where(
        foo => ((foo.Bar != null) && (foo.Bar.Name == ((string)node1.Value))) || 
        ((foo.Bar != null) && (foo.Bar.Name == ((string)node2.Value))))

results in parameterized SQL:

exec sp_executesql
N'SELECT COUNT(*)
FROM [dbo].[Foo] AS [f]
    INNER JOIN [dbo].[Bar] AS [b] ON [f].[Id] = [b].[FooId]
WHERE ([f].[Id] = @__id_0) AND 
    (([f].[Name] = @__p_1) OR ([f].[Name] = @__p_2))',
N'@__id_0 uniqueidentifier,
@__p_1 nvarchar(4000),
@__p_2 nvarchar(4000)',
@__id_0='0760397D-DBE9-43B9-A27A-AEBCB991201D',
@__p_1=N'a',
@__p_2=N'b'

Further technical details

EF Core version: EF Core 3.1.7
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET Core 3
Operating system: Windows 10
IDE: Visual Studio 2019 16.8

closed-question customer-reported

Most helpful comment

I don't know EF6 very well, but it's hard to imagine that things really work differently with respect to constants... When you integrate a constant node in the expression tree, you're asking for, well, a constant to be integrated into the SQL - at the very least this already causes the query cache pollution issues.

All 5 comments

Is there potentially a risk for strings being used this way, i.e. a risk for potential SQL injection? Is there a risk of the input "breaking out" of the SQL string? (I'm not an expert in this area, so I don't know if this risk also exists when the string is parameterized or if parameterization negates/mitigates the risk)

No. Any query constants you provide will be properly generated and escaped in the final SQL. For example, constant strings will have single-quotes escaped.

In a production environment, will this (potentially) negatively impact SQL query plan caching and is this something that can be avoided? (I would like to avoid frustrating our DBAs ;))

Yes. Since you are integrating different constants instead of parameterizing, the final output will be two different SQL queries, and so your query plan cache will be polluted - this can be very bad for perf.

Will internal EF core query plan caching work as expected or will every query with different constants be seen as a new query to cache?

Queries with different constants will have to be compiled separately in EF Core, once again degrading perf in a potentially significant way.

Bottom line: constants should only be integrated in the expression tree when they're actually constants; generating expression trees with different constants effectively creates different queries both inside EF Core itself, and at your database. Definitely look into parameterizing instead.

Thank you very much for your reply.

I'll get to work on our workaround, definitely something to avoid.

I don't believe we had this issue in our EF6 implementation, is this something you are potentially considering for change in a future iteration, or is the design decision for constants pretty much etched in stone? I might've missed the backlog item with this request.

Kind regards,

Steven

I don't know EF6 very well, but it's hard to imagine that things really work differently with respect to constants... When you integrate a constant node in the expression tree, you're asking for, well, a constant to be integrated into the SQL - at the very least this already causes the query cache pollution issues.

I checked our EF6 implementation, it does the same thing. All the more reason to get things fixed up on our side asap.
Thanks again for taking the time to respond to my questions!

Thanks for checking and confirming on the EF6 side!

Was this page helpful?
0 / 5 - 0 ratings