Efcore: Enabling string.Join support

Created on 2 Sep 2015  路  30Comments  路  Source: dotnet/efcore

string.Join would be a cool feature to have now and then. Some database engines support this with an aggregate function, others such as SQL Server need a little hand-holding. Pluses include terser code and less data over the wire when projecting, since the alternative to concatenating on the server would be returning a child string collection for each row and concatenating the collection after the query. (If I understand correctly, child collections are usually transferred by joining each row of the child collection to the original row set and duplicating a lot of data.)

I expect that this would be low priority for you to add to EF, but I'm curious if anything is being done to allow extensions to the LINQ to query translation. We would be thrilled to implement string.Join support ourselves if it didn't require modifying EF and provider source code every release and recompiling.

I wasn't able to find anything about extending LINQ to query translation so I apologize if I missed it, but I know you are dramatically refactoring for EF Core and I can imagine many cool scenarios would be possible if there was an extension point in LINQ querying.

area-query type-enhancement

Most helpful comment

FWIW starting with SQL Server 2017, STRING_AGG() should be a good option for the translation of string.Join() in the cases that the input comes from a table.

All 30 comments

We have extensibility, we don't have any docs yet but you want to look at MethodCallTranslator.

Leaving this open to track String.Join. We would definitely be interested in accepting a contribution for this.

Notes: For SQL Server, you must use for xml path(''), type, otherwise XML characters like <, > and & will be escaped into &gt; etc.

for xml path(''), type).value('(./text())[1]', 'nvarchar(max)') has much greater performance than for xml path(''), type).value('.', 'nvarchar(max)') (13.2x in my tests).

(see this thread)

Also: string.Concat

I'm planning to do a contribution when I get time. I'm waiting for the standard dust to settle after 1.0.0 before I use EF Core in production because I see a fair number of bugs.

I understand this can be extended adding a new MethodCallTranslator, but what is the best way to add it to EF? The classes I found are all in the internal namespace.

Look at SqlServerStringSubstringTranslator. That will give you general pattern how those translators work. They are in Internal namespace but that is ok. Internal namespace allows us to make changes without breaking users since they shouldn't be directly interacting with those APIs.

I understand that, but isn't there a way to extend the functionality without using Internal namespaces? That isn't really "extensible" from my point of view.

I don't wanna produce code that may break with every EF update if I'm just "extending" the framework.

This issue talks about implementing functionality in EF core itself, which is not exactly "extending" framework. Rather extending (or improving to be more precise) Linq-to-SQL translation.
At present we don't have extension point for user to plug in custom method call translator. The current extension point are for provider writers to provide their custom translators.

Is there a specific reason you want to "extend" the framework?

I was under the impression that EF supported extensibility from Rowan's comment.

The ability to crate my own translators for common queries looks like a great feature. Similar to what nein-linq does.

Let me revive this thread @rowanmiller :
My plan for the SQL-Server implementation is to map string.Join(...) to T-SQLs CONCAT or CONCAT_WS via implementing an IMethodCallTranslator.

My first question is, if there is a general mistake in my thought? Imho calling CONCAT/CONCAT_WS would lead to a quite similar behavior as string.Join offers.

The second question refers to a gerneral strategy regarding the support of DBMS versions. As already mentioned there are two T-SQL functions to handle string concatenation - CONCAT and CONCAT_WS. While CONCAT is already available since SQLServer 2012 CONCAT_WS will be introduced with 2017.
I guess CONCAT would then be the way to go to support older systems/applications also..?

I'm pretty sure concat only works when the number of strings is known when building the query. Each string has to be passed in as an argument. It's not like you could do CONCAT(SELECT Str FROM X).

Last time I researched this, this was the state of the art, and it does require attention to avoid some tricky encoding corner cases: http://sqlblog.com/blogs/rob_farley/archive/2010/04/15/handling-special-characters-with-for-xml-path.aspx

I still don't understand how we can add an IMethodCallTranslator to the engine in a way that isn't hackish, everything is private or internal.

Still no documentation on this?

FWIW starting with SQL Server 2017, STRING_AGG() should be a good option for the translation of string.Join() in the cases that the input comes from a table.

@jnm2 You're absolutely right, I confused relations with scalar values. SQL-Server won't execute a script like

SELECT CONCAT(SELECT 'a' UNION SELECT 'b')

Maybe I can find the time to work into the FOR XML PATH - thing and adopt my implementation accordingly.

@divega: STRING_AGG seems to do the trick CONCAT isn't able to. But it will only work for SQLServer Versions >= 2017 and since there is just one general EFCore-package for SQLServer one would break the compatibility with the majority of SQLServer instances.

since there is just one general EFCore-package for SQLServer one would break the compatibility with the majority of SQLServer instances

Until now it just happens that we haven't added enough features that are version specific for it to become really compelling but we could certainly add API to enable groups of features for a version of the database server, e.g. the most granular version:

``` C#
optionsBuilder.UseSqlServer(connectionString, opt=>opt.Use2017Functions());

A bit more general:

``` C#
optionsBuilder.UseSqlServer(connectionString, opt=>opt.Use2017Features());

or even shorter:

C# optionsBuilder.UseSqlServer2017(connectionString);

We have discussed this before, e.g. at https://github.com/aspnet/EntityFramework/issues/9000#issuecomment-311867375 and https://github.com/aspnet/EntityFramework/pull/3011#r38792929.

@divega I personally like this approach, but its concrete implementation goes far beyond the borders of a normal contribution since it would (at least) touch all SQLServer specific mappers/translators/younameit.

@akamud - The thing you want to achieve is custom translation. Check #7368 which adds user defined scalar functions. Then you will have a local function which would do string.Join for you and provide a SQL translation for that (using .HasTranslation). It will not be hacky but you would still need to understand Expression tree etc to understand what should the translation look like.

This issue talks about adding translation in-built in EF. Adding another MethodCallTranslator which derives from IMethodCallTranslator inside EF is not hack-ish. Its internal but its code in same assembly.

@peterwurzinger - For implementation, you can start with very basic approach taking a translation which would work on all SqlServer editions. Then it would be localized enough to particular Translator being added.

In 2nd phase we need to flow information about SqlServer2017 functions to methodcall translators. There would be 2 approaches to that.

  • Either do normal translation as is and modify generated SQL-tree to change the methods. (e.g. Look at RowNumberPagingExpressionVisitor which changes Fetch/Offset to rownumber())
  • Or modify SqlServerCompositeMethodCallTranslator to take additional dependency on ISqlServerOptions (check SqlServerQueryGeneratorFactory class on how it takes dependency and flow the information about rownumber paging to SqlGenerator) and based on the options set to use SqlServer 2017 functions add different set of translators. So translators being added to the list will be conditional based on the options set.

As much as I was happy to discover STRING_AGG(), I cannot help but feel that it would have been better had sql server added support for an array datatype instead. I suspect that would solve quite a few things.

@9Rune5 isn't array types supported by memory optimized tables/types? You can create memory optimized type and pass it as parameter I believe. I didn't not checked but I think this is supported.

@9Rune5 You can create memory optimized type and pass it as parameter I believe.

Sure, but in this case we want to have a column in the result set contain a subtable.

context.MyTable.Select(t=>new{t.Id, mysubs = t.Subtable.Select(sub=>new{sub.Id, sub.Description}});

When I do stuff like that, EF generates queries with some mighty impressive joins and I am left with the feeling that those queries could be further optimized had SQL Server been able to return a table column type. Instead I'm left with some funky behaviour if I'm foolish enough to employ some .Skip() and .Take().

For me, string.Join() support is a desperate measure to get halfway to properly supporting the type of EF query in my example. I have a problematic join like this in some code I wrote recently, and STRING_AGG() seemed to go a long way of resolving the performance issues I witnessed. In that particular case I was able to live with the compromise of not having the data returned in an orderly fashion. A somewhat messy-looking string would have sufficed.

But many of my assumptions could turn out to be dead wrong and the only real solution is lazy-loading (or something completely different). After almost a decade of using EF, my team still suffer from "maybe hand-written sql can get us out of this performance jam"-ities and that is unfortunate. (the result isn't nice)

Not just string.join...mssql supports returning xml and json strings("FOR JSON AUTO" and "FOR JSON PATH" ).
There has been talk about basic string.join support since EF v 1 and nothing has been done about it. Even when you manage to write 99% of the SQL queries using linq, the second you realize these 3 functions return smaller and faster results than anything EF can come up with, you are stuck with using EF's raw sql query support, which kind of sucks because you're stuck with non-typed string datas representing actual T-SQL. And if you hand that magic piece of code to team members who don't know T-SQL, you're stuck with maintaining that piece of code for all eternity. I honestly cannot wait for the day we get basic functionality of any one of these functions. Even if the Expression needed to be used is complex as hell. Anything to get rid of raw T-SQL queries.

So, can string.Join be translated to STRING_AGG()?

@ihor-stavytskyi STRING_AGG is only available with SQL Server 2017 and later.

@ihor-stavytskyi STRING_AGG is only available with SQL Server 2017 and later.

I would really appreciate if you can help me with my problem.

I've got a table:
EmployeeId | JobPositionId | Salary

I want to write a query that groups employees by job position, calculates number of employess that have high salary (by some magic rule, for this example it does not matter) and return ids of all employees that have the same job position.

In SQL I can achieve it by this query:

SELECT JobPositionId
    , COUNT(*) as TotalEmployeesCount
    , SUM(CASE WHEN Salary > 42 THEN 1 ELSE 0 END) as HighSalaryEmployeesCount
    , STRING_AGG(EmployeeId, ', ') as Employees
FROM Employees
GROUP BY JobPositionId
ORDER BY HighSalaryEmployeesCount DESC

I'd like to get the same query from C# code using EF Core:

var byJob = await _dbContext.Employees
    .GroupBy(x => x.JobPositionId)
    .Select(g => new
    {
        JobPositionId = g.Key,
        TotalCount = g.Count(),
        HighSalaryEmployeeCount = g.Sum(x => x.Salary > 42 ? 1 : 0),
        Employees = string.Join(", ", g.Select(x => x.EmployeeId)) // string that produces an error
    })
    .Skip(offset)
    .Take(limit)
    .ToArrayAsync();

When I execute this query I get the following error:

System.InvalidOperationException: 'The LINQ expression 'Select<Employee, string>(
    source: GroupByShaperExpression:
    KeySelector: e.JobPositionId, 
    ElementSelector:EntityShaperExpression: 
        EntityType: Employee
        ValueBufferExpression: 
            ProjectionBindingExpression: EmptyProjectionMember
        IsNullable: False
    , 
    selector: (x) => x.EmployeeId)' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync().'

Executing this in-memory by calling AsEnumerable() or ToList() is not an option because I'd like to apply paging (limit and offset) and get limited number of the resulting records.

Is it doable with EF Core?
I'm using MS SQL Server 2017 and EF Core 3.0.2.

The only way is to write the actual t-sql query and use EF's raw sql funcrionality to cast the result to a predefined class type. As you can see from the comments there is no String.Join support yet, unless you use ToList, but as you've figured out that is not a solution since it would defeat the purpose of what you're trying to achieve.

The only way is to write the actual t-sql query and use EF's raw sql funcrionality to cast the result to a predefined class type. As you can see from the comments there is no String.Join support yet, unless you use ToList, but as you've figured out that is not a solution since it would defeat the purpose of what you're trying to achieve.

Thank you, @Looooooka

@ErikEJ isn't mainstream support for SQL Server 2016 scheduled to end a year from now?

@9Rune5 - That is not true. See https://docs.microsoft.com/en-us/sql/sql-server/end-of-support/sql-server-end-of-life-overview?view=sql-server-ver15#lifecycle-dates

And also end of support is considered end of extended support not mainstream support. People can continue to use SqlServer after mainstream supported has ended. It is not unsupported scenario.

@smitpatel The web page you pointed to says 2021. It is currently 2020. 2021 - 2020 = 1 year from now. Is it a rounding thing? (I believe it is July 2021, so I was off by two months)

Either way it is a moot point when mainstream support ends. We still support SqlServer 2012 since it has not reached end of life yet.

Weird. EF just does returns ALL data onto client and blow ups my query results because except concat have another sum join. Thank a lot....

I would prefer it to fail.

Was this page helpful?
0 / 5 - 0 ratings