Efcore: Keeping a DbContext open overtime eventually slows down SaveChanges()

Created on 29 Apr 2018 · 20Comments · Source: dotnet/efcore

As part of trying to re-engineer my application to combat GC collections, I've come across what I consider to be an EFCore performance degradation.

My scenario is a pretty much insert-only application, where I completely disable change tracking and set AutoDetectChangesEnabled = false.

Here's the code I'm using the in-memory database to prove my point...

When I run the benchmark program I've written (dotnet run -c release fast) while constantly creating and disposing theDbContext objects I get a very consistent time for SaveChanges() which is around 4-6ms per 1000 entities saves.

When I run the benchmark in the "slow" mode (dotnet run -c release slow) where a single DbContext is preallocated and used throughout the test, the time for SaveChanges() gradually increases over a few hundred cycles and reaches tens of millieseconds within a few hundred cycles... The expectation is that the reuse of the same DbContext instance would be better for performance, but in fact it causes a pretty severe degradation. It would seem as though there's some sort of resource leakage even after SaveChanges() is called.

Steps to reproduce

Here's the small test I've come up with:

```c#
class Program
{
static void Main(string[] args)
{
if (args[0] == "fast")
Fast(args);
if (args[0] == "slow")
Slow(args);
}

static void Slow(string[] args)
{
    using (var ctx = new BlogContext())
    {
        ctx.Database.EnsureDeleted();
        ctx.Database.EnsureCreated();
        ctx.ChangeTracker.QueryTrackingBehavior    = QueryTrackingBehavior.NoTracking;
        ctx.ChangeTracker.AutoDetectChangesEnabled = false;

        var sw = Stopwatch.StartNew();
        var cycle = 1;
        while (true)
        {
            PumpIntoDB(ctx, sw, cycle++);
        }
    }
}

static void Fast(string[] args)
{
    using (var ctx = new BlogContext())
    {
        ctx.Database.EnsureDeleted();
        ctx.Database.EnsureCreated();
    }

    var sw = Stopwatch.StartNew();
    var cycle = 1;
    while (true)
    {
        using (var ctx = new BlogContext())
        {
            ctx.ChangeTracker.AutoDetectChangesEnabled = false;
            ctx.ChangeTracker.QueryTrackingBehavior    = QueryTrackingBehavior.NoTracking;

            PumpIntoDB(ctx, sw, cycle++);
        }
    }
}

static void PumpIntoDB(BlogContext ctx, Stopwatch sw, int cycle)
{
    foreach (var i in Enumerable.Range(0, 1000))
        ctx.Blogs.Add(new Blog() {Author = $"dans{i}"});

    sw.Restart();
    ctx.SaveChanges();
    var saveChangesAdd = sw.ElapsedMilliseconds;
    Console.WriteLine($"Cycle {cycle:D4}\tSave:{saveChangesAdd}ms");
}

}

public class Blog
{
public int Id { get; set; }
public string Author { get; set; }
}

public class BlogContext : DbContext
{
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
//optionsBuilder.UseNpgsql("Host=localhost;Username=test;Password=test;Database=leakymcleakage");
optionsBuilder.UseInMemoryDatabase("leakymcleakage");
}

public DbSet<Blog> Blogs { get; set; }

}

```

Further technical details

EFCore version: <PackageReference Include="Microsoft.EntityFrameworkCore.InMemory" Version="2.1.0-preview2-final" />

Operating system: ubuntu 18.04
IDE: console

closed-duplicate

Source

damageboy

👍1

Most helpful comment

I am in no way recommending using IDbContextPoolable.ResetState for this :trollface:

anpete on 2 May 2018

👍3 😄2

All 20 comments

cc @roji

damageboy on 29 Apr 2018

When you run on Windows?

ErikEJ on 29 Apr 2018

Looks pretty much the same.
I've rebooted into my windows install, windows maybe be a few ms faster than linux right now, but I'm still seeing the same degradation.

If you want a more complete result set with (gasp) charts, I'll be more than happy to provide that...

damageboy on 29 Apr 2018

@damageboy we have certainly not optimized the DbContext code for this usage. At the minimum every time you call SaveChanges in your test, we are iterating over an ever-growing dictionary to figure out what entities need saving.

In general we recommend using short-lived DbContext isntances. If you care enough to avoid allocating a full DbContext instance for every request, we offer the alternative to use DbContext pooling which can reuse DbContext instances by resetting just enough state on them on each request.

If you really think optimizing for this usage could help your scenario, we can talk about possible optimizations, but as far as we don't consider it a recommended usage pattern, we should avoid any changes that make the change tracking code significantly more complex.

divega on 1 May 2018

@divega thanks for looking at this. I know that @damageboy is working around this by instantiating a new DbContext for each batch of data to be inserted (we used to be colleagues :)), and that the idea of using the same instance was to reduce allocations.

Why DbContext pooling may address that, it's surprising for a DbContext that turns off all change tracking and does inserts only to grow - my personal expectation is for there not to be any accumulated state once SaveChanges() is called. I have no idea how complicated it would be to make that happen, but I wonder if there couldn't be other conceivable issues caused by this: it does mean that there is at least a theoretical behavioral difference in behavior before and after the SaveChanges().

If the effort required here is large or you feel there really isn't enough value, then I don't think it's extremely critical (although we'll see what @damageboy says), but otherwise it seems like a good thing to maintain as a goal..

roji on 2 May 2018

One more comment - one of the really nice improvements EF Core brought to the table over EF6 was batched updates, allowing efficient bulk insertion to happen within EF. This issue seems to be part of supporting that scenario as well (at least for some cases).

roji on 2 May 2018

@roji, @damageboy more concretely, if you look at the code I pointed to before, we use a single dictionary to hold the identity map of the DbContext. Every time after we successfully save changes we will set the state of all entries to Unchanged, but the next time you call SaveChanges we will still have to iterate over all entries to figure out which contain modified or new entities. Then iterate again to set the sate to unchanged. And this becomes more expensive every time. One possible optimization would be to maintain unchanged entries in a separate dictionary, to avoid revisiting them every time, but that would mean in other places (e.g. fix up) where we are currently querying just one dictionary we would need to start looking into more. State changes would need to move entries from one dictionary to another. In general I think it is possible but I am not sure the scenario warrants the added complexity. I am not sure how this would compare with just using DbContext pooling.

divega on 2 May 2018

@divega thanks for the explanation. I think I understand better.

My expectation was that if one turns off change tracking on the context (ctx.ChangeTracker.AutoDetectChangesEnabled = false; ctx.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;) there would be no need for the DbContext to remember anything about entities which were inserted. However, I understand that the above two properties allow one to turn off certain aspects of change tracking, but without turning the feature off as a whole.

I know that at least some users (such as @damageboy) really aren't interested in the change tracking part of EF, and prefer to manage that part themselves, but it would seem as though that isn't 100% possible at this point. As you already provide knobs for disabling some parts of change tracking, it may be worth thinking about allowing users to go all the way and disable it altogether.

roji on 2 May 2018

👍1

Well, in @damageboy’s test, at the point both auto-detect changes and tracking of queries are disabled, we are doing only enough “change tracking” to remember what calls to Add the program made. If there was a way to also disabled that part, I don’t think the program would do what @damageboy wants it to do. So it is really a question on whether we can optimize for this test, or whether we add a feature so that all unchanged entries are automatically detached.

divega on 2 May 2018

If there was a way to also disabled that part, I don’t think the program would do what @damageboy wants it to do

I was thinking more about a way to make the context reset its tracking dictionary as part of SaveChanges(), bringing it more or less back to the same state as when it was instantiated - but that may be the same thing as what you said.

In any case, you obviously know best how much effort this would be and whether it's worth it...

roji on 2 May 2018

bringing it more or less back to the same state as when it was instantiated

Yes, I think that would be possible. But it would be equivalent to what DbContext pooling already does, only with a different API.

divega on 2 May 2018

Understood. IIRC DbContext pooling requires users to use dependency injection, which isn't always appropriate (@damageboy's application doesn't use it all), so it may be interesting to provide this.

roji on 2 May 2018

BTW this actually reminds me of a conversation I had with @anpete about how it would have been nice to have the perf advantages of DbContext pooling without having to go into dependency injection...

roji on 2 May 2018

I am in no way recommending using IDbContextPoolable.ResetState for this :trollface:

anpete on 2 May 2018

👍3 😄2

Thanks @anpete, I'll see with @damageboy about trying this out, I think he can stomach an internal method call :)

roji on 2 May 2018

@roji @damageboy @anpete @divega See also #9118. A mode that throws entities away once saved would have additional perf benefits when using store-generated keys.