The situation is the following:
We are currently working in a new application that has to use a database with a quite old design (lots of CHAR(n) columns, even for keys and FK's).
Recently, we noticed that we were having quite poor/slow performance when inserting new data in tables.
After reviewing the resulting INSERT, we noticed that instead of using CHAR(n) parameters, VARCHAR(n) are used instead, that when used in a sp_executesql statement can result in bad performances.
Then we reviewed the model (generated via scaffolding), and we noticed that we had no _.IsFixedLength()_ for the CHAR(n) properties/columns involved in the inserts (is the scaffolding doing the right thing here?).
Once corrected, we tried again, but the INSERT's were performing still bad, reviewing the newly generated statement, VARCHAR parameters are still being used.
After this, I have decided to dive into the code and I think that I might have found the reason of this behaviour, in SqlServerStringTypeMapping I have found the following:
```c#
public SqlServerStringTypeMapping(
[CanBeNull] string storeType = null,
bool unicode = false,
int? size = null,
bool fixedLength = false,
StoreTypePostfix? storeTypePostfix = null)
: this(
new RelationalTypeMappingParameters(
new CoreTypeMappingParameters(typeof(string)),
storeType ?? GetStoreName(unicode, fixedLength),
storeTypePostfix ?? StoreTypePostfix.Size,
GetDbType(unicode, fixedLength),
unicode,
size,
fixedLength))
{
}
private static string GetStoreName(bool unicode, bool fixedLength) => unicode
? fixedLength ? "nchar" : "nvarchar"
: fixedLength
? "char"
: "varchar";
private static DbType? GetDbType(bool unicode, bool fixedLength) => unicode
? (fixedLength ? System.Data.DbType.String : (DbType?)null)
: System.Data.DbType.AnsiString;
```
The method GetDbType is unconditionally using _System.Data.DbType.AnsiString_ for non-unicode strings. Is there any reason why it shouldn't be using _System.Data.DbType.AnsiStringFixedLength_ for fixed length strings?
In case that there is a good reason for this behaviour, is there any way to trigger the usage of CHAR(n) parameters instead of VARCHAR(n)?
EF Core version: 2.2.1
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Notes from triage: I believe we are not using the fixed length DbTypes because it causes SQLClient to pad the strings passed to it, which then causes issues. However, I can't find where we did this explicitly. @AndriySvyryd can you find anything?
So, when fixed length DbTypes were used there were some issues with the strings being padded.
It would be informative to know what kind of issues were those.
In our case, we might probably trade not-very-good-performance of inserts (and queries, as they are also affected, but it wasn't evident at first) for this padding issue.
I understand that reenabling the usage of fixed length DbTypes via .IsFixedLength() usage might cause some problems as it changes current behaviour, but maybe we could have some kind of additional _"iKnowWhatImAskingFor"_ bool flag ?
@ajcvickers A test that shows the failure that this would cause is SimpleQueryTestBase.Comparing_to_fixed_string_parameter:
C#
from c in cs
where c.CustomerID.StartsWith(prefix)
select c.CustomerID;
CustomerID is mapped as char(5) if the value supplied in prefix is less than 5 characters it will be padded with whitespace and the query will return no results.
@elBranchi I believe you are hitting two issues here. The first looks like a duplicate of #14160, which is fixed for 3.0 already. The second is that the use of non-fixed length DbTypes is by design due to the padding that SqlClient does, which is a problem in scenarios like shown by @AndriySvyryd above. I doubt this is something we will change even though the perf characteristics are problematic. However, we will look into making it easier for an application to change the DbType if needed.
So, if I understand it correctly, the scaffolding issue could be avoided/solved by switching to version 3.0.
On the other side, for the scenario shown in the test I assume that problem would result by using a CHAR(5) parameter for prefix? Is there any reason that prevents from adjusting the size of the parameter to match the real length of the string in case of using fixed strings?
I mean, in the test case we are not doing a direct comparison, we would be probably using some function that might have prefix as parameter, I don't think that the parameter has to match exactly the column.
I can imagine that there might exist reasons to dismiss this approach, let me know. In any case I would be happy to have some way to adjust the DbType in certain scenarios.
@elBranchi Currently our mapping doesn't depend on the values, it would require significant infrastructure changes. Also in general we try to use the same parameter types for the same query to avoid cache fragmentation.
@AndriySvyryd that's more or less the answer that I was fearing/expecting and that also makes a lot of sense.
In this case, the only thing remaining I imagine that would be devising/designing the mechanism that would allow the applications to tell EFC that certain DbTypes are to be used instead of the selected by the model.
In any case it would need to be applied to queries and also insert/updates, so I don't have a clear picture of how I would include such feature in a clean and consistent way. One way would be enabling the change in the model, and the other would be applying some modification on an operation by operation basis.
@AndriySvyryd We already configure the parameter Size differently based on the length of the string. Maybe if the mapping is for fixed length, the length is specified, and the size of the string matches the length, then we can set the DbType to fixed. Thoughts?
@ajcvickers Good idea, we can actually set DbType to fixed and adjust Size to match the string length, this avoids padding.
@ajcvickers @AndriySvyryd
Ok, this starts to sound really good to me, it will make happy my colleagues also.
I'm already planning on trying the 3.0 preview on a different branch, and having this change on either 2.2.x or a later 3.0 preview would be perfect.
But will that not cause procedure cache pollution?
@ErikEJ Yes, but only if using fixed-length types with variable length strings, which seems like a reasonable compromise to make.
Clearing up milestone to look at it again in triage.
I am just wondering if we can fix the inference code to copy the fixed length facet (and let the padding happen), but only for certain operations like comparison and assignment. That should help with cases like SimpleQueryTestBase.Comparing_to_fixed_string_parameter, in which we are inferring the facets for the argument of a function, without introducing cache pollution.
BTW, re the current proposal, I am not sure that operations between fixed length parameters and fixed length columns when the length is different perform better than between variable length and fixed length. This is something that we can test.
@ErikEJ Yes, but only if using fixed-length types with variable length strings, which seems like a reasonable compromise to make.
Maybe I am missing something, but it sounds like this will happen pretty much any time we use string parameters with fixed length string columns, and that seems quite common.
Most helpful comment
@ajcvickers Good idea, we can actually set
DbTypeto fixed and adjustSizeto match the string length, this avoids padding.