Efcore: Separate methods for raw SQL that leverage interpolated strings

Created on 16 Feb 2018  路  28Comments  路  Source: dotnet/efcore

Currently the behavior is different if you call FromSql or ExecuteSqlCommand with an inline interpolated string (parameterization happens) or if you assign the interpolated string to a variable and then pass it to the method (the string is formatted with the arguments).

In some cases customers expect parameterization but it doesn't happen: #10080

In other cases customers expect parameterization not to happen, but it does happen: #10956

This is very confusing and hard to explain. Interpolated strings in C# were simply not designed in a way that works well with having two overloads of the same method in which one works with interpolated strings and the other one with a regular string.

I think we should separate the methods, but we can only do this in major release because it is a breaking change.

This would also imply removing the RawSqlString type.

breaking-change closed-fixed type-enhancement

Most helpful comment

To elaborate on what @smitpatel wrote above:

We discussed this extensively in the EF design meeting today. For a while I have been inclined to adopt option A, which would keep FromSql as the version that takes FormattableString and would require a new name that conveys _danger_ for the version that works directly with strings.

But after reading many of the comments on this thread, I began to realize that this approach would not address a couple of important concerns:

  1. What the interpolated string version of the method is doing is actually very special and can be pretty surprising if you come with the expectation that you are using the language feature just to build a string

  2. There are actually many ways to use the string version of the API that are perfectly safe. In other words, this version of the method is not really a more dangerous API than DbCommand.CommandText (for which there is an off-the-shelf fxcop rule) or similar APIs on Dapper. Things only became dangerous when we conflated the interpolated string version and the compiler could switch between the two versions without you realizing it

So, after discussing, we agreed to go with option B:

  1. Deprecate both flavors of the APIs
  2. Come up with the new naming pattern. As @smitpatel mentioned, we are going with FromRawSql, FromInterpolatedSql, ExecuteRawSql, ExecuteInterpolatedSql.

The (relatively) big disadvantage of this approach is that we can no longer use the simplest name, "FromSql", but we believe we can live with this.

All 28 comments

Absolutely agree with this move. If it helps, I wrote up some more test cases on how this is confusing (and sometimes dangerous) when it first came up here: https://github.com/NickCraver/EFCoreInjectionSample/blob/master/Program.cs

@NickCraver thanks. We are going to try to address #10080 in the shorter term with a Roslyn analyzer. I think we can probably leverage your tests as tests cases for the analyzer.

cc @anpete

And remove or modify the analyzer?

I believe modify. I haven鈥檛 thought through all the details but I believe it can still provide value on the string overload.

Please take into consideration cases like this:

var tablename = dbcontext.Model.FindEntityType(typeof(ABC)).Relational().TableName;
delete from {tableName} where a=123;

How to create a raw sql in such case? Is there currently any workaround?

I'm glad this is to be improved. One request is being able to opt-into raw strings when using the interpolated string arguments/methods, something like:

```c#
var tablename = dbcontext.Model.FindEntityType(typeof(ABC)).Relational().TableName;
$"select * from {Sql.Raw(tablename)} where date between {from} and {to}"

`from` and `to` would be parameterized and `tablename` would not. This idea could be reused if `Sql.Raw` returns an implementation of `IRawSql` and if something implements `IRawSql` then its `IRawSql.Value` will be inserted and not parameterized. Taken further, the above example could be shortened to the following C# if `IRelationalEntityTypeAnnotations` extends `IRawSql` which when evaluated would return a string like "[someSchema].[SomeTable]".

```c#
var table = dbcontext.Model.FindEntityType(typeof(ABC)).Relational();
$"select * from {table} where date between {from} and {to}"

IRelationalPropertyAnnotations could also extend IRawSql and its IRawSql.Value in this example could return "[date]":

c# var table = dbcontext.Model.FindEntityType(typeof(ABC)).Relational(); var column = dbcontext.Model.FindEntityType(typeof(ABC)).FindProperty("Date").Relational(); $"select * from {table} where {column} between {from} and {to}"

Implementations of IRawSql for entity types and properties would need to be specific to each database provider so that identifiers are properly escaped with brackets, double-quotes, back ticks, etc.

I would also suggest providing arguments that take an IEnumerable<FormattableStrings> to enable composing interpolated strings without losing parameterization. Here's a slightly contrived example:

c# public IQueryable<Post> PublishedPosts(FormattableString filter = null) { return this.Posts.InterpolatedSql(new FormattableString[] { $"select * from dbo.Posts where Published = 1 and ", filter ?? $"1=1" }); }

You could also imagine building a List<FormattableString> and passing that in.

But perhaps there is a better way I am not thinking of.

@divega to provide API shapes we want to separated overloads.
Suggestion: FromRawSql & FromInterpolatedSql

A bit of a brain dump on this issue, since we need to discuss it soon:

_Note: when I refer to FromSql, ExecuteSqlCommand is also implied_

  1. If we made FromSql the plain string version of the method, this would cause existing correct usages of the interpolated string functionality to start concatenating parameter values into the SQL. This would likely cause database errors (e.g. concatenating strings literals without the quotes) but could even re-open the SQL injection vulnerabilities the method is trying to protect against

  2. This leaves us with two options: (a) make FromSql the method that only takes interpolated strings and come up with a new name for the method that accepts plain strings, or (b) obsolete or remove all existing overloads of FromSql and come up with two entirely new method names

  3. IMO, the FromSql method name is very nice and it would be a shame to lose it

  4. From some point of view, the version of FromSql that works with plain strings is the basic one, and the one that works with interpolated strings is the sophisticated one, so my first intuition is that the one that works with plain strings should have the less sophisticated name...

  5. However, it is significantly harder for me to come up with a non-awkward name that conveys that the method works with interpolated strings than to come up with a non-awkward name that conveys that it only works with plain strings (i. e. I don't think the words interpolated or format are very user friendly)

  6. Based on these points, my first inclination is to create a new method (let's say for now that it is called FromSqlString) which works with plain strings and leave the existing FromSql overload that works with interpolated strings alone.

  7. With plan (a), we could choose to leave the overload that works with plain strings (it actually accepts RawSqlString structs) in place, but with ObsoleteAttribute indicating what to do: pass a FormattableString or use FromSqlString

  8. If we decided that we want to go with plan (b), I don't have any compelling ideas. FromSqlQuery is not completely awful IMO, but ExecuteSqlCommand is already formed like this. Raw, Plain are possible words to qualify the version that takes a regular string.

@divega Adding FromRawSql() would allow the existing FromSql to continue working with interpolated strings and indicate to users of FromRawSql that it should be used with care. My opinion is that breaking FromSql on regular strings at compile time is worth it since this will be 3.0. I'm in favor of plan a.

@jeremycook thanks! I am inclined to do this as well. Any opinion on whether FromRawSql would be a good enough name for the method that only takes normal strings? Any other suggestions for names?

Jumping in really late on this, as I just stumbled into this nuance of EF today. I personally would prefer your "plan (a)" as well. It would nudge developers who rely on the library into the right path. The current API is quite dangerous, as you can go from safe parameterization straight to dangerous concatenated SQL by a simple refactoring.

A suggestion on naming might be "FromUnsafeSql", to make it clear that anyone using that overload needs to be careful about its use, and possibly point out the SQL Injection dangers right in the XML comments (I'm surprised in 2019 that the majority of devs I run into still don't know what SQL Injection is, or why it can be dangerous).

@KeithJRome I'm not completely sure FromUnsafeSql would be accurate since certain uses of the string interpolation API can also be unsafe. I do think it is helpful to make people aware of the risk through naming conventions, if at all possible. FromUnsafeSql just seems like to strong of an assertion in the other direction. The security of your code very much depends on how you use each API and understanding that. That can't easily be summed up in a name.

@divega +1 for FromCodingHorrorSql or FromParameterizedSql if I'm being somewhat serious. I still think FromRawSql is the best name thus far. Although FromParameterizedSql might lead folks to read the intellisense docs in the if they don't know what parameterized means.

@jeremycook FromParameterizedSql is a cool name. I just don't know which version of the method it would be a better fit for :smile:

One pretty extreme option is to not have any method at all that accepts regular strings, under any name - this would force people to only specify raw SQL via interpolated strings. This would make it really difficult to introduce SQL injection, but on the other hand it makes SQL composition/concatenation impossible (I think?), which is probably too much.

Otherwise, +1 for removing the overload of FromSql(string) (leaving that name for interpolated only) and adding FromSqlUnsafe(string) - this should at least give users pause before using it.

@jeremycook

I'm not completely sure FromUnsafeSql would be accurate since certain uses of the string interpolation API can also be unsafe.

Can you be more specific/provide an example? It seems like a method accepting FormattableString should be as safe as reasonably possible, which is why it seems right to me to label the other one "unsafe"...

One possible objection to having Unsafe in the name is that it's associated with C# unsafe (i.e. pointers). There are also some places in the .NET API where the word "dangerous" is used in method names; this may be a bit extreme, but to be honest I'm leaning towards this kind alarmist naming in the context of SQL injection.

My thinking around the "unsafe" naming is that it conveys a sense of warning to the API consumer. Essentially letting them know that the training wheels are off if they choose that option, and they need to exercise additional caution.

There are indeed some cases where you don't want an interpolated string to automagically convert into parameterized SQL. Here is a perfect example (which is actually what led me to this discussion thread in the first place):
context.Database.ExecuteSqlCommandAsync($"set IDENTITY_INSERT {tableName} ON")

It looks like reasonable code, but in reality, that statement converts to "set IDENTITY_INSERT @p0 ON", which fails because the set statement doesn't play nice with parameters that identify db objects.

To be totally honest, I kind of feel that this API should not convert FormattableString to parameterized SQL at all. It is doing so in a way that isn't obvious, and which has real consequences - including behavior that breaks otherwise legitimate code such as the above example. We already could use parameterized SQL very explicitly and clearly, so this feature feels like mostly a gratuitous use of a technical capability. If anything, the overload which supports FormattableString should have been designed as its own signature, so we could opt-in when it makes sense. But alas, this can of worms is already open now.

@divega Would it be possible for you to bring an API proposal to the design meeting next week?

@roji I think you are right. I can't think of a use of FromInterpolatedSql being unsafe, generating invalid SQL that will error, yes, but being unsafe, no. Below is what I was thinking would be a problem, but of course the entire sql parameter would get parameterized, and so my assumption was wrong.

var sql = $"SELECT * FROM dbo.Records WHERE Name = {name}";
// Safe but invalid SQL
Database.FromInterpolatedSql($"{sql} AND Id = {id}")
// And this won't compile
Database.FromInterpolatedSql(sql + $" AND Id = {id}")

To that end I am fine with UnsafeSql especially if debating RawSql vs UnsafeSql vs SomeOtherSql would some how prevent this change being made to the API. I'd rather see this change made because it will at least prevent erroneous use of string interpolation where you think you are safe but in fact nothing gets parameterized:

var sql = $"SELECT * FROM dbo.Records WHERE Name = {name}";
Database.FromSql(sql + $"AND Id = {id}")

Design meeting decision (short version):
FromRawSql & FromInterpolatedSql
ExecuteRawSql & ExecuteInterpolatedSql

To elaborate on what @smitpatel wrote above:

We discussed this extensively in the EF design meeting today. For a while I have been inclined to adopt option A, which would keep FromSql as the version that takes FormattableString and would require a new name that conveys _danger_ for the version that works directly with strings.

But after reading many of the comments on this thread, I began to realize that this approach would not address a couple of important concerns:

  1. What the interpolated string version of the method is doing is actually very special and can be pretty surprising if you come with the expectation that you are using the language feature just to build a string

  2. There are actually many ways to use the string version of the API that are perfectly safe. In other words, this version of the method is not really a more dangerous API than DbCommand.CommandText (for which there is an off-the-shelf fxcop rule) or similar APIs on Dapper. Things only became dangerous when we conflated the interpolated string version and the compiler could switch between the two versions without you realizing it

So, after discussing, we agreed to go with option B:

  1. Deprecate both flavors of the APIs
  2. Come up with the new naming pattern. As @smitpatel mentioned, we are going with FromRawSql, FromInterpolatedSql, ExecuteRawSql, ExecuteInterpolatedSql.

The (relatively) big disadvantage of this approach is that we can no longer use the simplest name, "FromSql", but we believe we can live with this.

I can't think of a use of FormattableString being unsafe, generating invalid SQL that will error, yes, but being unsafe, no

I just wanted to point out here for anyone else finding later: this isn't accurate...it's still unsafe. Strings are very dangerous and as long as they are in play anything that gets turned into a string in fair game for input exploitation. I added some ways this is exploitable in the repo at the start of the issue: https://github.com/NickCraver/EFCoreInjectionSample/blob/master/Program.cs

@NickCraver can you point to which example you are talking about here?

I think @jeremycook was referring to the method that only accepts FormattableString.

From the examples in https://github.com/NickCraver/EFCoreInjectionSample/blob/17fa99c2d1a46933755ba477f65090b6b4805e6d/Program.cs#L20-L50, I think the only one that would compile against such method is the first one. All the other ones use strings.

BTW, I am not pushing back on the fact that there is always going to be some way to do something unsafe with interpolated strings or without them. I just want to understand if you are referring to something that separating the methods doesn't address.

Agree with @divega, I'd be interested in seeing a case where a method accepting only a FormattableString could be used in an unsafe way. No casts are possible between string and FormattableString, you can't concatenate FormattableStrings... It basically seems like the only way to integrate user input is to go through parameterization, which is safe...

@NickCraver I can understand how future onlookers could misunderstand my point. So I reworded my comment, replacing "FormattableString" with "FromInterpolatedSql".

@jeremycook my apologies - you wrote FormattableString and my brain read it as IFormattableString even after a few passes. I completely agree on the concrete type: there's no way I can think of to break that approach. If we allowed the interface anywhere...oh yeah, lots of fun. Sorry about that!

Closing since this is now done and the break is documented.

Was this page helpful?
0 / 5 - 0 ratings