Efcore: Query: Translate byteArray.Length on SQL Server and SQLite to avoid reading complete blobs

Created on 3 Oct 2018  路  17Comments  路  Source: dotnet/efcore

when querying the length of a byte[] sql like this is produced

actual:
SELECT [Blob] From [MyTable]
expected:
SELECT DATALENGTH([Blob]) as [HowLong] From [MyTable]

Steps to reproduce

```c#
class MyTable {
public byte[] Blob { get; set; }
}

ctx.MyTables.Select(x => new { HowLong = x.Blob.Length }).ToList()
```

Further technical details

EF Core version: 2.2.0-preview2-35157
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Windows 10
IDE: Visual Studio 2017 15.9.0 Preview 3

area-query closed-fixed customer-reported punted-for-3.0 type-enhancement

Most helpful comment

I have a fix for this on SQL Server ready to go and satisfying a pre-existing test (that was skipped as part of the pipeline rewrite). I have yet to rerun all tests, and I need to work on this for SQLite.

This is my first contrib and I haven't grasped much any etiquette of contributing, so I'm going to dive into the guidelines on the AspNetCore repo. I just wanted to confirm whether or not I should wrap this up and PR it, or if help here is unnecessary.

Passing Test -

        [ConditionalFact()]
        public void Translate_array_length()
        {
            using var db = CreateContext();
            db.Set<MappedDataTypesWithIdentity>()
               .Where(p => p.BytesAsImage.Length == 0)
               .Select(p => p.BytesAsImage.Length)
               .FirstOrDefault();

            AssertSql(
                @"SELECT TOP(1) CAST(DATALENGTH([m].[BytesAsImage]) AS int)
FROM [MappedDataTypesWithIdentity] AS [m]
WHERE CAST(DATALENGTH([m].[BytesAsImage]) AS int) = 0");
        }

Cheers 馃嵒

All 17 comments

Probably need to add Array.Length translation for SqlServer.
Based on the query model shape, it would be easy to do so marking for up-for-grabs.
Let me know if anyone need more guidance.

According to the documentation from Microsoft (https://docs.microsoft.com/pt-br/sql/t-sql/functions/datalength-transact-sql?view=sql-server-2017) we have many possibilities for DataLength, it would not be better to meet all of them use something like this:
EF.Functions.GetDataLength(T)

// byte[]
ctx.Employees.Select(p => EF.Functions.GetDataLength(p.Photo))

// string
ctx.Employees.Select(p => EF.Functions.GetDataLength(p.LastName))

It is worth mentioning that it is a simple suggestion, and if you feel good about it I can send a PR.

Translate Length for byte[] and string. No need to add any function.

I would also say translate .Length to DATALENGTH.

But there is also the LEN function
https://docs.microsoft.com/en-us/sql/t-sql/functions/len-transact-sql?view=sql-server-2017

It's basically the same except it excludes trailing blanks (and has very bad performance because it has to go trough the entire data) i don't know how much use people have for that one tough...

@smitpatel, should we wait for this? https://github.com/aspnet/EntityFramework/issues/9242

I've tried doing something similar here with SqlServerStringLengthTranslator.cs

But I believe I have the need on something from the Remotion.Linq side.

@divega to make a call on LEN vs DATALENGTH for string.Length

@ralmsdeveloper - #9242 only blocks in case of EF.Property. If you use direct member access, array types work.

Notice: If processing a unicode string, DATALENGTH will return twice the number of characters.

Using DATALENGTH(array) sounds good for byteArray.Length. I don't think there is a perfect translation in SQL Server for string.Length:

  • DATALENGTH(string) not only needs to be divided by 2 for Unicode strings, it is also inaccurate when the string contains Unicode Surrogates, which take more than 2 bytes.

  • LEN(string) may very well be slower, and it doesn't count trailing blanks. That is surprising and can also be a pain if you are mixing server-side and client-side evaluation (I remember LINQ to SQL had issues reported about this). Years ago we came up with the following workaround: LEN(string + '.') - 1. This not only looks more complicated but also fails to return correct results if the string is of length 4000 (or 8000 for non-Unicode). We can fix that if we cast the string to a MAX-sized string, e.g. LEN(CAST(string as nvarchar(max)) + '.')-1, but is it worth it? There are probably more complicated solutions involving conditional expressions.

Last time I remember we were faced with this decision (today 11 years ago, my first job as the LINQ to Entities PM with @katiiceva :smile:) we went for LEN(string) because we didn't want to fight SQL Server semantics at the expense of simplicity and predictability (it is really the simplest and most expected SQL translation!). I wasn't 100% convinced then and I am not now, that DATALENGHT(string) [/ 2] wouldn't work as well.

We could translate to LEN(string) (I thought that we were already doing it :blush: we already do this!), and in addition expose EF.Functions.DataLength() that can take any object as @ralmsdeveloper suggested, for people that care.

Besides all of this, I assume we probably don't want to do anything special about NULL strings...

What do others think?

SQLite already translates string.Length, but we should add a translation for byteArray.Length (to length(byteArray))

Marked this issue as ReLinq-dependent since array's are broken into subquery by relinq. Which works with member access but fails for EF.Property as of now. (See #9242). The code is in place, tests need to be added and verified before we close this issue. Also we need byte array property in functional tests model.

It's an ArrayLength UnaryExpression in tree.

In case it's useful, here's how Npgsql translates does this:

  1. A fragment in NpgsqlSqlTranslatingExpressionVisitor which intercepts SubQueryExpression and returns an ArrayLength expression when appropriate.
  2. Overridden VisitUnary() in NpgsqlQuerySqlGenerator to pick up that expression and generate the appropriate PostgreSQL function.

I have a fix for this on SQL Server ready to go and satisfying a pre-existing test (that was skipped as part of the pipeline rewrite). I have yet to rerun all tests, and I need to work on this for SQLite.

This is my first contrib and I haven't grasped much any etiquette of contributing, so I'm going to dive into the guidelines on the AspNetCore repo. I just wanted to confirm whether or not I should wrap this up and PR it, or if help here is unnecessary.

Passing Test -

        [ConditionalFact()]
        public void Translate_array_length()
        {
            using var db = CreateContext();
            db.Set<MappedDataTypesWithIdentity>()
               .Where(p => p.BytesAsImage.Length == 0)
               .Select(p => p.BytesAsImage.Length)
               .FirstOrDefault();

            AssertSql(
                @"SELECT TOP(1) CAST(DATALENGTH([m].[BytesAsImage]) AS int)
FROM [MappedDataTypesWithIdentity] AS [m]
WHERE CAST(DATALENGTH([m].[BytesAsImage]) AS int) = 0");
        }

Cheers 馃嵒

@svengeance your contribution would be very much appreciated. We've recently added a byte array field on the GearsOfWar Squad entity type, so an added test in that suite would be better than in BuiltInDataTypes (check out Byte_array_contains_parameter as a starting point). Note that you would need one test in GearsOfWarQueryTestBase, and an override in GearsOfWarQuerySqlServerTest where the SQL assert would belong.

And yes, all this would simply be submitted as a PR against our master branch (ideally it would also translate for Sqlite as well - should be quite easy).

This has already been done by @svengeance and merged in #19189.

Awesome, that this was implemented. It's the little things like this one that make migrating from LINQ to SQL so much easier.

Was this page helpful?
0 / 5 - 0 ratings