Efcore: Do not track after SaveChanges()

Created on 8 Jul 2017  路  17Comments  路  Source: dotnet/efcore

Based on https://stackoverflow.com/questions/44923147/how-to-avoid-inserted-entity-to-be-selected-after-insert-in-ef-core

Whenever we have generated columns (On insert or update), we do Select after insert/update to retrieve generated values to populate the entity in memory. In some cases, user may want just save the data and not care about updating them in memory any further (because not using them later).

At present there is no mechanism for this functionality.

area-perf type-enhancement

Most helpful comment

@Giancarlos This issue is not related to AsNoTracking--that will tell EF not to track entities returned from the query. This issue is about SaveChanges ignoring store-generated values for already tracked entities.

All 17 comments

Try AsNoTracking:

https://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx

@Giancarlos This issue is not related to AsNoTracking--that will tell EF not to track entities returned from the query. This issue is about SaveChanges ignoring store-generated values for already tracked entities.

__To the implementer__:

  • Disable fixup when detaching to avoid messing up the FKs and navigations
  • Once implemented this should be used for data migrations.

It might be nice to be able to pass an optional "get me the entity" flag on a save changes call. That way when you need it add it, when you don't don't.

Are there plans for this to be implemented? I'm using Autofac + EF Core in a console application and this is really biting me.

@leus this is our backlog milestone, meaning that we don't have specific plans to implement it. Could you elaborate on how this is biting you?

Sure - I have a .NET Core console application that uses Autofac to create a big list of Tasks that receive a DbContext to do stuff. Everything works beautifully except I get a big hit on memory, and most of it is used by entities inserted. In the few places where we do queries we make sure of using AsNoTracking() (but we also use QueryTrackingBehavior.NoTracking when creating, so this is probably overkill).

image

Here the SqlChecklist.Processor.Writers.Sql.Model.TextData object is just a Entity that contains a big string (ntext).

My contexts are created as follows:

var dbOptions = new DbContextOptionsBuilder<ProcessorContext>();
dbOptions.UseSqlServer(connectionString);
dbOptions.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);

Regarding this issue and also this one: https://github.com/aspnet/EntityFrameworkCore/issues/11852

As noted in previous comments and as i have already observed, using AsNoTracking queries or turning Tracking off for the whole context only prevents entities being added to the change tracker as a result of Query operations, SaveChanges still maintains inserted/updated entities in the change tracker.

@divega @ajcvickers Suggestion for a new configuration option:

I think that having a SaveChanges option (at context level and method level) which removes newly inserted entities from the change tracker after saving to the database would be useful. As with the original opening comment on this issue, when there are database generated columns such as auto-incrementing primary keys, with an option such as this enabled there would then be no need for EF to perform a select to retrieve the newly generated key value because the entity would effectively be discarded by EF after the save.

I use the description: _"option which removes newly inserted entities from the change tracker after saving to the database"_ rather than: _"option which prevents newly inserted entities from being added to the change tracker after saving to the database"_ because as far as i know the only way to insert an entity using SaveChanges is to first attach it to the change tracker using Context.Add(entity) so the entity will always be in an attached state before SaveChanges is called.

I think an option such as this would be less relevant when database generated columns are not used as there is no select query performed behind the scenes by EF so there would be no performance benefit to be provided by the option. Instead the entity can just be detached from the change tracker by using Entry(entity).State = EntityState.Detached; after calling SaveChanges, achieving the same result. Of course doing this does not prevent the behind the scenes select query from taking place right after the SaveChanges to retrieve any database generated key values that are then never used.

Maybe the option should instead be targeted at the SaveChanges behaviour when database generated columns are used, ie. don't go and get the updated values, just fire and forget, but this will cause issues with the change tracker unless the saved entity is detached? Having a side effect of detaching the entity only when database generated columns are in use is going to be really bad. It sounds much better as an an option to detach entities after saving, which has the performance improving side affect of preventing an unnecessary select query when database generated columns are used.

So rather than just detaching newly inserted entities, it would be better if the new option made SaveChanges detach any entity this it has just persisted to the database, both inserts and updates, similar to what @roji suggested here: https://github.com/aspnet/EntityFrameworkCore/issues/11852#issuecomment-385953097

So what i am suggesting is an option which changes the behaviour of Save operations so that they result in just fire and forget database writes and then detach the affected entities from the change tracker.

Use Cases:

I think an option such as this would be useful in scenarios where it is not possible or desirable to use BulkCopy tools. For example when bulk importing data into a database where depending on each record being imported, there sometimes needs to be query-then-update operations performed, or lookup-and-populate-related-entities, rather than just plain inserts.

In the above scenario it can be desirable to use EF for importing data, especially if it is already being used in the project and we already have the model to work with, rather than bringing in some additional framework or tools etc. The issue with using EF for inserting/updating large numbers of records is that the change tracker causes serious performance issues due to build up of entities unless the context is regularly disposed and re-created during the import cycle. This is never much of an issue for normal use cases such as Api and Mvc controllers where request scopes and DI make sure that the context is short lived.

To get the most performance benefit in most cases the context will need to be recreated after each record is imported, although the overhead of doing this is tiny compared to the performance benefit provided by starting with a fresh context and empty change tracker, it still seems wasteful. Another issue is that it does not always make sense or look nice to have to structure the import code such that the context is recreated after each record import.

To solve this kind of issue in the past i have used a mixture of context recreation and looping through the change tracker to detach all entities after each save. This is fine when using Guid's as keys which are generated by EF, but if using database generated keys there will still be wasteful select queries happening.

Here is a method on the context that i have been using to clear out the change tracker:

public static void DetachAllEntities(this DbContext context) {
            var entries = context.ChangeTracker.Entries()
                .Where(e => e.State != EntityState.Detached)
                .ToList();

            foreach (var entry in entries) {
                if (entry.Entity != null) {
                    entry.State = EntityState.Detached;
                }
            }         
        }

But recently, i have tried using the suggestion here: https://github.com/aspnet/EntityFrameworkCore/issues/11852#issuecomment-386045894

public void ResetState() {
            ((IDbContextPoolable)this).ResetState();
            // https://stackoverflow.com/questions/46685823/how-to-use-the-executionstrategy-properly       

        ((IDbContextPoolable)this).Resurrect(((IDbContextPoolable)this).SnapshotConfiguration());
}

There appears to be no difference in terms of performance that i can measure, both seem to have the same affect.

As an example, clearing out the change tracker after each record import during an import of 220 records from a CSV file resulted in a 25x performance increase. Without clearing out the change tracker after each record, the import ends with 220 entities in the change tracker and takes 25 time longer to complete, so not a huge amount of entity build up but enough to cripple the performance.

In that example, when importing multiple files, i am recreating the context after each file import, but just clearing the change tracker after each record import.

I am sure this is one of the reasons why EF gets such a bad rep in terms of performance, you have to be very careful how it is used, if you are not fully aware of things like this it is easy to just dismiss it as slow and inefficient.

a good way of getting a fresh context is to take advantage of the ServiceScope Factory:

foreach (FileInfo file in files) {
    string filePath = file.FullName;
    using (var scope = ServiceScopeFactory.CreateScope()) {
        var myCsvImporter = scope.ServiceProvider.GetRequiredService<MyCsvImporter>();
        var importResult = await myCsvImporter.ImportFromFileAsync(filePath);
        ........

    }
}

Of course MyCsvImporter could also use the same technique to create a fresh context for each record, i am not sure if this will be faster than clearing out the change tracker, but in this case i suspect it would not be measurable due the database writes taking up the vast majority of the time. The main reason not to create a fresh context for each record is due to code structure and readability and the desire to at least make some use of the framework provided DI convenience rather than always grabbing the required services from a ServiceScope.

I also agree with @roji here: https://github.com/aspnet/EntityFrameworkCore/issues/11852#issuecomment-386031370

It looks like the performance benefit of this option will not just depend on the primary key being generated by EF or by the database, but also on the EF Core Provider used.

For example if using Npgsql and a Guid primary key, but with database generated values via the postgres uuid_generate4() function, it looks like there will still be no select query performed after the Save due to the use of the Postgres feature described in this comment:https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/81#issuecomment-236642569
Npgsql is using the Postgres "RETURNING" statement to avoid a second query to get the newly generated Guid.

@roji I am not sure how this is handled when there are other or multiple database generated values in a table?

@chrisckc you're right that Npgsql has a somewhat lighter mechanism for retrieving database-generated values after insert. However, even for Npgsql, if we somehow know that we don't need to bring back database-generated values, we can omit the RETURNING clause and save some network bandwidth (for the values coming back), so this issue is still relevant.

Having multiple database-generated values doesn't change anything here - they should simply all appear in the RETURNING clause.

my use case is only having INSERT not having SELECT permission (Audit Tables)
In this case I just want to fire an forget

@timker why not use ExecuteSqlRaw then?

that would work, (or create a stored proc)

(and I'm happy to do either) but I would prefer to use ef types.

This enforce you to write sql when you write backward compatible code.
For example If I add a column and host has being updated but db hasnt, if this select was optional, the below code would work

DbContext.Entry(entity).Property(x => x.Firstname).IsModified = false;
DbContext.SaveChanges();

IMO this select should be optional.

If anyone has the same backward compatibility issue, there is a better sln from raw sql. If you have the entity models classes isolated, you could copy these into a new namespace (**.Backward) and have a factory that can decide which dbcontext to choose.

Was this page helpful?
0 / 5 - 0 ratings