Efcore: What's the performance benefit of pooling DbContexts

Created on 20 Oct 2017  路  17Comments  路  Source: dotnet/efcore

Since EF Core 2.0 it supports pooling of DbContext instances. I can find statements from @anpete such as:

The new method introduces a few limitations on what can be done in the OnConfiguring() method of the DbContext but it can be adopted by many ASP.NET Core applications to obtain a performance boost.

Implying that pooling give a performance boost, while @ajcvickers states:

The point of DbContext pooling is to allow reuse of DbContext instances from a pool, which for certain scenarios can result in a performance boost over creating a new instance each time. This is also the primary reason for connection pooling in ADO.NET, although the performance boost for connections will be more since connections are generally a more heavyweight resource.

Implying that DbContext pooling is not as significant as connection pooling.

I couldn't find any analysis or benchmarks of the actual performance improvement this brings. To my knowledge, the creation of DbContext instances is _really light_. A small local test showed that I could make over 600,000 instances a second on a single thread.

I found this question from @davidroth:

Out of curiosity: Can you share some benchmarks about this improvement?

Unfortunately, the question was ignored.

What does cause a performance hit when DbContext instances are _not_ reused? Can we find a publically accessible benchmark that shows what the benefit of pooling is and especially an alalysis of what is causing the bottleneck without the reuse of DbContext instances?

Thanks in advance.

closed-question

Most helpful comment

@dotnetjunkie DbContext initialization is a two-part process. Creating the instance itself does minimal work--basically just initializing any DbSet properties. This means creating a DbContext that is not used does not add significant overhead, which is important for scenarios such as a controller that may need to access the database but often does not--just injecting/creating the instance should not be a perf issue.

The first time the context is used, for a query, to track something, etc., there is a second lazy initialization that happens. This involves loading some services from EF's internal D.I. that are scoped to the same lifetime as the context. Not all services are like this--many are singletons that are shared between all context instances. Also, not all services are initialized at this time--services that may not be needed can be loaded lazily--however, we have seen that this is sometimes more expensive than loading immediately due to the way the D.I. system compiles constructor calls.

The second initialization step is also not really slow, but DbContext pooling means that instead of doing this step, an existing pooled instance is used instead, with resetting of the service internals. This is less flexible--each context instance in the pool must be configured the same way so that the same set of services with the same configuration is correct each time.

Whether or not this really shows a measurable difference in your app depends a lot on how much else the context is doing. Where it works well is for simple, no-tracking queries where very little other work is being done. As soon as the context starts to do more work, the relative amount of time saved by pooling is much less.

It's also worth keeping in mind that there is additional initialization which is not bound to a particular context instance. So, for example, the first time a model is used by any context instance there will be a hit while the model is built and cached. Likewise for the first time a query is executed, and so on. Pooling will not have any impact on these things.

Hope that helps!

All 17 comments

You can see it in action in the demo: https://github.com/anpete/EFDemos

Great, thay does show a runnanle example with the performance difference, but it still lacks an analysis of why this difference exists. Why is calling a property on a DbContext so much slower the forst time? What is happening under the covers?

The Model is built/configured and cached in the AppDomain, I think.

@dotnetjunkie Very good question, after the first instantiation when the metadata model is created DbContext instances should be very cheap in practice. Even if dependency injection is use to create supporting services.

The only explanation is that DbContexts don't share internal services and for each context a large graph of supporting services needs to be instantiated.

@dotnetjunkie DbContext initialization is a two-part process. Creating the instance itself does minimal work--basically just initializing any DbSet properties. This means creating a DbContext that is not used does not add significant overhead, which is important for scenarios such as a controller that may need to access the database but often does not--just injecting/creating the instance should not be a perf issue.

The first time the context is used, for a query, to track something, etc., there is a second lazy initialization that happens. This involves loading some services from EF's internal D.I. that are scoped to the same lifetime as the context. Not all services are like this--many are singletons that are shared between all context instances. Also, not all services are initialized at this time--services that may not be needed can be loaded lazily--however, we have seen that this is sometimes more expensive than loading immediately due to the way the D.I. system compiles constructor calls.

The second initialization step is also not really slow, but DbContext pooling means that instead of doing this step, an existing pooled instance is used instead, with resetting of the service internals. This is less flexible--each context instance in the pool must be configured the same way so that the same set of services with the same configuration is correct each time.

Whether or not this really shows a measurable difference in your app depends a lot on how much else the context is doing. Where it works well is for simple, no-tracking queries where very little other work is being done. As soon as the context starts to do more work, the relative amount of time saved by pooling is much less.

It's also worth keeping in mind that there is additional initialization which is not bound to a particular context instance. So, for example, the first time a model is used by any context instance there will be a hit while the model is built and cached. Likewise for the first time a query is executed, and so on. Pooling will not have any impact on these things.

Hope that helps!

Apart from what @ajcvickers said, an additional benefit of pooling is that it can drastically reduce allocations. This can greatly benefit high-scale scenarios such as loaded web servers etc. - we saw a ~30% throughput improvement on some of TechEmpower benchmarks. As always, measure for your application and then decide.

What if there are unsaved modifications on tracked entities after a DbContext is used? Will it be saved when the instance is reused from the pool and called SaveChanges? Can/should I dispose of a DbContext when using pooling?

Won't the long-lived DbContext get slower and slower as it will track more and more entities over all reuses? Or is the tracker reset when the DbContext is returned to the pool?

@gius - All services including ChangeTracker are reset when the DbContext is returned to the pool. So whenever you get a context from pool, it is same as what it would be if you re-initialize it (except for few limitations).

@smitpatel Can this be done manually on a normal context? ChangeTracker.Reset() ? I can't seem to find anything like that.

@smitpatel Can this be done manually on a normal context? ChangeTracker.Reset() ? I can't seem to find anything like that.

I'm not sure how the EF team is resetting, but I think detaching all tracked entities is the equivalent of resetting the ChangeTracker.

approximate-code:
var entries = context.ChangeTracker.Entries().ToList();
entries.forEach( e => e.State = EntityState.Detached);

@zanate4019 Detaching all entities is quite slow. Creating a new context instance would be much faster.

Thanks again for your timely input Arthur. Good to know.

So back to @smitpatel 's question. Is there a way for us to manually reset the ChangeTracker? I am thinking about how to manage the memory footprint of a long-lived context. Resetting the change tracker could be a decent alternative to disabling it altogether.

Jason

@zanate4019 Best practice is to not use a long-lived context. Instead, use a new context instance for each unit-of-work. Is there a reason you are using a long-lived context instance?

I understand that short-lived contexts are best.

We have real-time workflow system with a mixture of WPF and Winforms clients with a WCF pub-sub back-end that pushes updates. We are moving to a unified ASP.NET core / Angular 7 frontend where notifications are pushed through signalr and initial data loads are handled through a REST API.

In this instance, this windows service I am building is just needed in the transition. It will pickup updates once a second from the system database and send them out to clients through signalr. We expect to only need it 6 months. Since it only runs 4 selects and 4 corresponding group of updates once a second to pickup the updates and then mark them notified I had planned to use the same implementation of our repository interface and just let core inject the context at startup.

I guess I could create a special repository implementation that gets a transient context from the service provider rather than have it injected in the constructor. Current version I just turned off change tracking.

I'm still curious about the best way reset ChangeTracker.

@zanate4019 Thanks for the additional info. It certainly sounds to me like a short-lived context instance--i.e. a new one each second to pick up and send updates--is the way to go here.

To answer you question directly, the best way to "reset the ChangeTracker" is to create a new context instance. (Yes, pooling does some internal stuff to reset the context, but it has limitations and works in the context (no pun intended) of pooling.)

Ok -- new repository it is. Not a problem -- only need to actually implement the 8 methods the service needs.

Thanks again for your help

@dotnetjunkie DbContext initialization is a two-part process. Creating the instance itself does minimal work--basically just initializing any DbSet properties. This means creating a DbContext that is not used does not add significant overhead, which is important for scenarios such as a controller that may need to access the database but often does not--just injecting/creating the instance should not be a perf issue.

The first time the context is used, for a query, to track something, etc., there is a second lazy initialization that happens. This involves loading some services from EF's internal D.I. that are scoped to the same lifetime as the context. Not all services are like this--many are singletons that are shared between all context instances. Also, not all services are initialized at this time--services that may not be needed can be loaded lazily--however, we have seen that this is sometimes more expensive than loading immediately due to the way the D.I. system compiles constructor calls.

The second initialization step is also not really slow, but DbContext pooling means that instead of doing this step, an existing pooled instance is used instead, with resetting of the service internals. This is less flexible--each context instance in the pool must be configured the same way so that the same set of services with the same configuration is correct each time.

Whether or not this really shows a measurable difference in your app depends a lot on how much else the context is doing. Where it works well is for simple, no-tracking queries where very little other work is being done. As soon as the context starts to do more work, the relative amount of time saved by pooling is much less.

It's also worth keeping in mind that there is additional initialization which is not bound to a particular context instance. So, for example, the first time a model is used by any context instance there will be a hit while the model is built and cached. Likewise for the first time a query is executed, and so on. Pooling will not have any impact on these things.

Hope that helps!

I hope this description will add to this doc, it has puzzled me for a long time.

Was this page helpful?
0 / 5 - 0 ratings