Efcore: Please support multiple parallel queries if NoTracking passed

Created on 9 Aug 2018  路  10Comments  路  Source: dotnet/efcore

Describe what is not working as expected.

Right now it's impossible to run multiple async calls to build up results quickly against a database without having multiple contexts which is discouraged in ASP.NET core because of DI. This is a pretty standard requirement that would greatly speed up a number of database operations.

Please consider allowing this to happen if the user passes AsNoTracking() on the query. That way there should be no issues with change state tracking and the user is explicitly telling you that it's ok to run these in parallel.

Obviously this can't be done automatically even with projections, but .AsNoTracking should be fine.

This would be a major win for speed with EF.

closed-by-design customer-reported

Most helpful comment

As you're no doubt aware switching to transient will cause major bugs because entities could easily be added or updated on separate contexts at different layers of di.

My solution to the problem allows for dealing with a specific case without creating a complete mess in your code or defeating di.

The metric for this being obviously an excellent idea is the case where you need to pull data from multiple disparate datasets and accumulate it into a response in an api as an example. Happens all of the time.

In my solution the slow queries will run in parallel and the sql server (for example) will be able to execute them simultaneously and return at 90+% of the speed they would individually and take only as long as the longest running query.

As it stands right now that same api call runs sequentially at N lengths of time instead which is essentially Always slower than my way. Or you get to defeat di and instantiate your own data contexts for each creating a mess or you get to break scope and risk random bugs in change tracking.

My solution is elegant and obvious (especially with the error message telling you what to do if you don't know) and every other way is UGLY or slow.

The only. Other way around this is to break up your single call to multiple api calls that have their own dB context and run those async effectively doing the same but introducing the overhead of http requests over the wire for no reason and introducing complexity in your api for no reason at all.

Please reconsider, because this would be a major win for performance AND improves code readability and reliability.

All 10 comments

This is a pretty standard requirement

And one that breaks on SQL Server with the fact that whatever you do, you can only ever execute one command at the same time, even with MARS. Multi threading is not possible over one connection on - hm - every database I know. NoTracking does not enter here - it is more fundamental on a dadatabase level.

Which is why every single iteration of this requirement I have seen uses multiple connections, which requires multiple contextx.

@NetTecture Then either it needs to be able to handle that and open multiple connections to do so, or there needs to be clear guidance on best practices to do this with asp.net core and DI because using a scoped data context it's impossible to get a second context and that's the recommended and basically only way you can setup a data context.

One way or another the current state of affairs isn't good and needs to be fixed.

@JohnGalt1717 Registering a DbContext with transient scope is fine--there is a parameter in AddDbContext for this.

At this time, we don't plan to add any support for multiple parallel queries in EF Core. Compelling perf numbers could change this in our long-term plans, but may be something that happens at a lower-level anyway.

As you're no doubt aware switching to transient will cause major bugs because entities could easily be added or updated on separate contexts at different layers of di.

My solution to the problem allows for dealing with a specific case without creating a complete mess in your code or defeating di.

The metric for this being obviously an excellent idea is the case where you need to pull data from multiple disparate datasets and accumulate it into a response in an api as an example. Happens all of the time.

In my solution the slow queries will run in parallel and the sql server (for example) will be able to execute them simultaneously and return at 90+% of the speed they would individually and take only as long as the longest running query.

As it stands right now that same api call runs sequentially at N lengths of time instead which is essentially Always slower than my way. Or you get to defeat di and instantiate your own data contexts for each creating a mess or you get to break scope and risk random bugs in change tracking.

My solution is elegant and obvious (especially with the error message telling you what to do if you don't know) and every other way is UGLY or slow.

The only. Other way around this is to break up your single call to multiple api calls that have their own dB context and run those async effectively doing the same but introducing the overhead of http requests over the wire for no reason and introducing complexity in your api for no reason at all.

Please reconsider, because this would be a major win for performance AND improves code readability and reliability.

Hope EF support this one.

@huang-tianwen - We do not plan to support this scenario.

Can anybody explain to me what kind of major bugs I may encounter while using two transient contexts, as @JohnGalt1717 mentioned in his post above? I am currently simultaneously using two contexts to get parallel select and count on potentially large datasets in a PostgreSQL database. Is there any danger lurking in that scenario? Isn't transaction isolation supposed to protect from the effects of concurrent changes? Thanks in advance for any responses.

Thee is none. What JohnGalt1717 proposes is a firing level abuse. In general, unless documented otherwise (which he seems not to know) no .net class is supposed to be thread safe. The reason being that thread safety comes with a cost, and you rarely want to pay that one - and most scenarios are single threaded, even in multi threaded applications. And EfCore is not never ever documented as being thread safe. As, again, are most classes.

Using multiple threads for parallel operations is how EF is supposed to be used. The database at the end has to maintain ACID conditions- it is the CORE requirement of a database to be the ultimate authority for locking and guarantee correct data to all connections (or an error). This is what transaction isolation is for.

If you do pure read operations, you may want to use isolation level ReadCommitted - that is less resource intensive than the higher levels and perfectly fine for things like count(). Using one central EF instance (or worse, as I have seen, one static database connection) is abusing the database and creates a TON of down line problems, starting with memory (you never release the cache) and going to undefined transaction boundaries (that also are very hard to debug because you never know who has a reference and is sending the commit that breaks code on the other end of the application. The given use case (with no tracking) is a fringe case - but even then it comes with a high cost, which means that EF would have to handle multiple parallel db connections (because even with MARS on SQL Server, a db connection is single threaded internally - sql server only handles one request at a time, and all MARS allows you is to use an async style multiple operations but not multiple threads - still good i.e. for nested loops). Not worth going for the fringe benefit when - properly used with Notracking - an Ef context is extremely lightweight. The model (which is actually most of the memory if you count out caching) is shared between instances. And again, you need multiple db connections anyway.

The only danger I can see is that you must handle the performance on that on the backend. It is quite easy to have 100+ open ef instances sending requests at the same time. Which all do table scans for a count - which means the server may be REALLY busy and overloaded. But that then again goes hand in hand with bad indices or ultimately just too small hardware. Just be prepared.

In your particualr case (parallel select and count) the numbers may be slightly different - there is a chance that between the start of the select and the start of the count a row was added. But using multi threading would not really help here at all - and the price for helping would be high. You basically would have to use repeatable read isolation level (which is EXPENSIVE) and issue both commands in one transaction context (removing multi threading in most cases, unless you can merge tx contexts in your database of choice) in order to guarantee consistency. Not worth it.

Thank you very much for the exhaustive explanation, @NetTecture. My case is the standard case for listing pagination. I think it's not a big deal if the count can get slightly off sometimes.

On the topic of using one context for multiple queries running in parallel, I actually ran into that and was wondering what was happening until I noticed @ajcvickers's note that service lifetime must be set during DbContext registration (props to him for pointing that out).

Thanks again. :slightly_smiling_face:

@NetTecture It isn't an abuse at all. Note that I said AsNoTracking. That means that every materialization would be unique thus no classes being reused other than the functions (not writable properties) of the DataContext.

Compartmentalizing those functions shouldn't be a big deal at all, and solves the major issue with using multiple contexts (other than building the context) is that it doesn't work well with DI in asp.net core. (which really really really wants you to use scoped instances.

Plus the code is horrendous.... slightly better now in .net 3 that you can use scoped usings but still not at all pretty.

Was this page helpful?
0 / 5 - 0 ratings