Efcore: Option to turn off automatic setting of navigation properties

Created on 5 Apr 2018  路  97Comments  路  Source: dotnet/efcore

From loading related data EF Core page is the following tip:

_Entity Framework Core will automatically fix-up navigation properties to any other entities that were previously loaded into the context instance. So even if you don't explicitly include the data for a navigation property, the property may still be populated if some or all of the related entities were previously loaded._

I would like an option to turn this property off so that if I choose not to load a navigation property it is not loaded for me. The cost of transporting data that is not needed is slowing response time. If there are 70 gages and each gage has a status then that status has 70 gages. There is a reference looping option mentioned it the loading related data article which I did try but the status object still contained the list of gages which I verified through postman.

Below is an example showing the loading of data and clearing the navigation properties that are automatically "fixed up" by EF Core. I am going to need this code in all of my gets in order to drastically reduce the JSON payload size,.

When we use an Include we know the data that is needed for an object. "Fixing up" is something that is not needed. Please remove this feature or please give us a way to disable it.

Code to load data:
```C#
try
{
using ( var context = new MyContext() )
{
res =
( from c in
context.Gages.
Include ( "Status" )
select c ).ToList ();

// code to clear nav property loaded by EF Core
            res.ForEach ( 
                g =>
                {
                    if ( g != null && g.Status != null && g.Status.Gages != null )
                        g.Status.Gages.Clear ();
                } );

}

}
catch ( Exception ex )
{
throw ex;
}

```C#
[Table("Gages", Schema = "MYSCHEMA" )]
class Gage
{
#region Properties

    [Key]
    [System.Runtime.Serialization.DataMember]

    public virtual int Gage_RID
    { get; set; }

    [Required]
    [StringLength(255)]
    [System.Runtime.Serialization.DataMember]

    public virtual string Gage_ID
    { get; set; }


    [System.Runtime.Serialization.DataMember]

    public virtual int Status_RID
    { get; set; }

    #endregion

    #region Navigation Properties

    [System.Runtime.Serialization.DataMember]
    public virtual Status Status
    { get; set; }

    #endregion

}

[Table("Status", Schema = "MYSCHEMA" )]
public class Status
{
    #region Properties

    [Key]
    [DatabaseGenerated ( DatabaseGeneratedOption.Identity )]
    [System.Runtime.Serialization.DataMember]

    public virtual int Status_RID
    { get; set; }


    [Required]
    [StringLength(255)]
    [System.Runtime.Serialization.DataMember]

    public virtual string StatusName
    { get; set; }

    #endregion

    #region Navigation Properties

    [System.Runtime.Serialization.DataMember]

    public virtual ICollection<Gage> Gages
    { get; set; }

    #endregion
}

public partial class MyContext : DbContext
{
    public virtual DbSet<Gage> Gages
    { get; set; }

    public virtual DbSet<Status> Status
    { get; set; }

}
type-enhancement

Most helpful comment

Problem : Automatic fix-up does whatever it wants, at random, toss a coin and good luck !
Solution : Rely on external software to circumvent the issue that you cannot control or configure. Aka ReferenceLoopHandling.Ignore

.Net Core 3.0.0 now pushes System.Text.Json instead of Newtonsoft.Json (https://docs.microsoft.com/en-us/aspnet/core/migration/22-to-30)

Problem : Loop reference handling isn't implemented https://github.com/dotnet/corefx/issues/41002
Solution : Revert back to old solution instead of using the default, recommended solution.

Conclusion : enjoy your infinitely looping JSON because EF just decided yes, you absolutely needed _all_ the collections and navigation properties it could possibly find.

  • Why is this default ?
  • Why can't I turn it off ?

All 97 comments

Or you could just use ViewModels...

We do use view model in both wpf and angular. what would a view model do to reduce the size of the data that if being set by ef core

In the wpf case you have references sitting around so they don't take up that much space. but sending the data over json increases the size drastically

C# var res = ( from c in context.Gages select new GageModel { .... // Assign scalar properties Status = c.Status } ).ToList();
No references will be fixed up.

you referenced a view model rather that the getting of data. status is an object that is a nav property that has not been retrieved by ef so would this not set a null to status

we have our scalar props, these would be retrieved by the select, but if status had not been queried by an include then it would be null.

If you are referencing c.Status in your DTO then it will be populated by EF Core without needing any include.

ok but wouldn't ef core still fix up the Gages nav property on Status, getting back to the same point?

Where is the Gage being loaded in the query? Query is creating objects of GageModel & Status only.

I just tried your suggestion and looked at the json in postman and it works. But I would still want to have the option to turn this off ""Fixing things up". Its not a desirable feature for us. Yes this is a legitimate work around, but in our case Gage and Status are the DTO. So the objects have been created by the EF query why recreate more objects?

I have worked with EF prior to EF Core and maybe it is me being used to explicitly saying this is what I want using multiple includes and "." notation such as
.Include( "mynav1.sub1" ).

@dgxhubbard Few things to note here:

  • The part of the docs highlighted is not really relevant in this case. Fixup is not happening to entities that have already been queried and are being tracked. It is happening between the entities returned by the query. The same behavior will happen for no-tracking queries.
  • Only the entities requested by the query are being returned. EF is not creating any additional objects, only fixing up the references between the objects already returned by the query.
  • This is the same behavior that every version of EF from the first release in 2008 until now has had. There is nothing different here in the behavior of EF Core.
  • If the navigation property should never be populated, then consider removing it from the model. Unidirectional relationships with only one navigation property are perfectly valid.
  • If the navigation property is sometimes used and sometimes not, then a view model seems appropriate, as @smitpatel suggested.

All that being said, we will discuss this in triage and consider if there if a different mode for Include would be useful.

Isn't there a way to ignore a property when serializing to json?

I know in EF 6 that the properties that are not included are not loaded, and EF6 honors what should be and not be loaded. There is no request for loading the Gages property of Status, but the property is being loaded. There are cases when these properties will be loaded so turning it off by ignoring in json is not the answer.

When serializing to return to front end we want a list of gages with their status. Suppose there are a 100 gages returned to the front end, and each has a status, with the behavior or EF Core the size of the payload will be increased by 100 fold.

If include is honored this would be a great solution. I proposed a boolean to turn off the fixup if there if include code cannot be made to behave like EF 6.

The doc says:

_Entity Framework Core will automatically fix-up navigation properties to any other entities that were previously loaded into the context instance. So even if you don't explicitly include the data for a navigation property, the property may still be populated if some or all of the related entities were previously loaded._

My interprtation of this is the first query, for gages, the list returned would be just the gages and their status (no Gages would be set on the status). However, on the second query and every query after, for gages, the Gages property will be "fixed up". So, except on the first query, the json payload in the 100 gages case will always be 100 times larger than the original request.

@dgxhubbard EF6 and EF Core behave in the same way here. Also, you mention a first and a second query, but your code above only shows a single query.

Code is part of a web api, so there will be multiple queries executed against it. On the first query the connection will store the gages for the next request. Then every request after it will Fixup the object gragh so that Gages of Status will be filled with items. I will double check my EF6 code.

I am sorry you are right it does have this behavior. I would not have thought that because of serialization over the wire

I will go back to my original comment on that. In one case I have a WPF app that makes calls against the database, using EF6. In this case there are references to the gages set in the Gages propety and these are in memory, so the size increases but not as much as the serialization in json. Using json, the gages are sent out over the wire so they are not sent as a reference but as a gage and its properties, so the json payload will be roughly increased in size by the number of gages that are returned.

I added the code mentioned in the article

            .AddJsonOptions (
                options =>
                {
                    options.SerializerSettings.ReferenceLoopHandling = 
                            Newtonsoft.Json.ReferenceLoopHandling.Ignore;
                    options.SerializerSettings.ContractResolver = new DefaultContractResolver ();
                } );

and checked the result in postman, and each gage, in Gages property of Status, the properties of the gage are listed.

Also in the mock code here I have only listed minimal properties there are more than properties on Gage and Status.

@dgxhubbard In the code above, a new context is created for each query, so the context will not be tracking anything each time the query executes.

Unless you're setting the context as a singleton.

@Tarig0 The code above uses "new".

Ah yes lost the context

Tracking anything? I don't understand. A new context is created, but doesn't the context use results that have already been cached? If that is where you are going

I do know the Gages property on Status is filled in on the first query

@dgxhubbard Nope.

My test with the EF 6 code and EF core both showed the same thing. The Gages property is set.
This test was start the web api project and hitting with one request from postman. There were no other requests. If there is a problem with what I am doing or interpreting the result please correct me. Sorry this uses full code and there is json in the txt file.

Gages.txt

@dgxhubbard context.Gages.Include("Status") means for every Gage, also load the Status _relationship_. The relationship consists of one foreign key property and two navigation properties--Gage.Status and its inverse Status.Gages. When loading the relationship both navigation properties are set.

We are running EF Core 2.1 and according to the doc for lazy loading if you call UseLazyLoadingProxies, which we do, then any virtual nav property will be lazy loaded. If I am wrong please correct me. My example above is pseudo and leaves out the important face that we are calling UseLazyLoadingProxies. I apologize for that.

Here is our factory method to create a context.

`public static MyContext Create ( bool useLogger = false )
{

        var optionsBuilder = new DbContextOptionsBuilder<GtContext> ();
        if ( optionsBuilder == null )
            throw new NullReferenceException ( "Failed to create db context options builder" );


       optionsBuilder.UseSqlServer ( MsSqlConnectionString );
        optionsBuilder.UseLazyLoadingProxies ();

        if ( useLogger )
            optionsBuilder.UseLoggerFactory ( DbContextLoggingExtensions.LoggerFactory );


        var context = new MyContext ( optionsBuilder.Options );
        if ( context == null )
            throw new NullReferenceException ( "Failed to create context" );

        // disable change tracking
        context.ChangeTracker.AutoDetectChangesEnabled = false;
        context.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;

        return context;
    }`

@dgxhubbard I don't follow how lazy loading is relevant to this. I would suggest that if the repro code you posted does not demonstrate the issue you are seeing, then it would be very useful to post a full repro that does what your product code does so that we can reproduce and understand the issue.

Why wouldn't lazy loading be relevant here. Gages is a navigation property and it is virtual. If lazy loading is enabled then Gages would not be loaded.

_EF Core will then enable lazy-loading for any navigation property that can be overridden--that is, it must be virtual and on a class that can be inherited from. For example, in the following entities, the Post.Blog and Blog.Posts navigation properties will be lazy-loaded._

Lazy loading is delaying the loading of related data, until you specifically request it. So is that not relevant here? Here we don't want Gages property loaded it is virtual hence it should not be loaded.
Am I wrong?

@dgxhubbard Lazy loading doesn't work like that. It never prevents or delays a relationship being loaded when that relationship is specified to be eager loaded using Include.

But the include is only for the Status property, not Status.Gages.
Refer to
Include

Here the remarks indicate that the property must to dot referenced for it to be included:

_To include a reference and then a reference one level down: query.Include(e => e.Level1Reference.Level2Reference)._

Is my interpretation correct here?

@dgxhubbard Status and Gages are the navigation properties for a _single relationship_. Include tells EF to load the relationship, which involves populating both navigation properties.

Maybe it would be easier to think about it from the relational side. Each Gage row in the database has one FK value that indicates it is related to a Status with that PK value. So, this means that that Status row and that Gage row are related. It follows that when entities are created from these rows that the two entities are related. This is reflected in the object graph by setting the Gage.Status property to reference the related objects in one direction. and adding the Gage to the Status.Gages collection to relate the objects in the other direction. These are both representations of the same relationship.

Thank you very much for the time and effort to explain this. The end result of this I will need to clear the Gages property in the web api code in order not to have it set. We want to send back, what we consider to be the primary properties of the Gage. I wish there were some extension method that said this is the nav property to load and don't follow the nav propeties of the item I am referencing.

Seems to me like changing the json serialize to a breadth first approach would smooth out that json file. But there is no duplication of information there. Another approach is limiting the levels you serialize, but that's just as difficult to maintain as clearing the lists.

I also see that you are also loading the Supplier reference which is what really starts making the json returned a mess.

yes it does. supplier is just like status with a gages list. The gages list is there to tell ef core there is a relationship, that's it. that's why I would like tighter control of what get loaded. With either an option to say please don't load nav properties in associated items like status and supplier or a new extension method. Something that does not break code for people who depend on the loading of associated properties and something for those who don't want them loaded

@dgxhubbard If you don't want the Gages property, then remove it. EF doesn't need it.

How then does ef know the one to many relationship. We use attributes to tell ef about the relationships.

@dgxhubbard Just remove it--that relationship looks like it will be one-to-many by convention. If you want to explicitly configure it, then something like:
c# protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder .Entity<Gage>() .HasOne(e => e.Status) .WithMany(); }

I looked on this reference. I thought I had to have both ends of the relationship. That is convention 4 in the article. So the convention can be just on Gage (convention 1 in the article). That's great thank you.

Configure one to many relationships

There will be cases when we need the Gages that have a Status. So removing the Gages is a good work around for limiting the payload returned. However, for a case where you need the link on both ends then it is not ideal. So its a tradeoff in design. Smaller payload versus functionallity for rare cases.

Discussed in triage and decided to move this to the backlog as a general idea for providing more flexibility in deciding which navigation properties should be populated. We need to be careful that we don't further confuse things when tracking is being used, but for no-tracking queries it is likely relatively safe to only set one side of the relationship and not the other.

Notes for others reading this thread:

  • This is specifically about the behavior of Include; it is not about general fixup in the state manager.
  • This will not impact the amount of data returned; that is dependent on the relationship, not fixup of the navigation properties.
  • This is not needed for JSON serialization--that just requires appropriate attributes to prevent loops in the serialization process

@dgxhubbard

There will be cases when we need the Gages that have a Status.

If the goal is to get the list of gauges with the status X then the following should be fine.

c# using ( var context = new MyContext() ) { res = ( from c in context.Gages.Include(c=>c.Status) // include only needed if you require info about the status outside of the Status Name where c.Status.StatusName == "Active" select c ).ToList (); }
I typically use the ling version of this so forgive any syntax issues but that should give you a list of Gages based on their status.

This way you don't need the inverse property.

Thank you. That is going to be one of our workarounds. For now we are going to just clear nav lists we do not need. When we delete a status then all the gages that have that status will need to be pulled, and since we are pulling the status anyway leaving Gages alone meets our needs better (since it is one pull from the database).

I also think it is better there is a way to disable the "Fix-up" properties.
I meet the situation that sometimes I need get both parent entity and associated child entities data which I just explicitly use .include method is OK.
but sometimes I only need get parent entity data, but even if I didn't use .include method,just use something like
_db.parentEntity.FirstOrDefault(...)
it also automatically get the child entities data together.I have to do extra work to exclude the child entities.

@dgxhubbard thank a lot about your comment. I had the same issue on my application, my repository included unexpected navigation properties in the result of my request and as I have circular references in my data model, I retrieved several Mb of data instead of a couple of Kb.
The solution is to either do not track entities in previous requests (AsNoTracking) or singletonized your dbcontext if you use repository pattern.
Do you know if the issue is solved in the last version of EF Core 2.1.3?

I, too, would like for a feature like this. Having to manually go through and remove preloaded relations that I have not requested would save a lot of headaches for serialization.

I am using EF Core 2.1.4 and I am having this problem. The following code will recreate the issue:

var result = db.Child
  .Include(c => c.Master)
  .FirstOrDefault();
return Json(result);

If both child and master has the corresponding navigation property, the result will choke the serialization with exception:

Newtonsoft.Json.JsonSerializationException: Self referencing loop detected with type...

If you put a break point and inspect the result before it is returned, you can see both child and master's navigation property are populated with references to each other. What's worth noticing is that instead of having all of its children, master entity only has the child loaded in this query, which ultimately defies the purpose of auto fixing as it is giving an incomplete view of the data.

We really need a way to disable this feature described in loading the related data

Entity Framework Core will automatically fix-up navigation properties to any other entities that were previously loaded into the context instance. So even if you don't explicitly include the data for a navigation property, the property may still be populated if some or all of the related entities were previously loaded.

I would also love a feature to disable automatic fix-up on both a per-request basis but also globally.

@ajcvickers I am also running into issues with automatic fix-up, a basic query like

_context.GlobalSettings
                .Select(gs => new GlobalSettings
                {
                    Id = gs.Id,
                    DefaultCountryFkNavigation = gs.DefaultCountryFkNavigation
                })
                .OrderBy(gs => gs.Id)
                .First();

can result in massive result because DefaultCountry has tons of stuff linked to it, including one-to-many relationships, so I end up with chains of relations all included in the final object (like country > clients from that country > their invoices > etc)

The problem is :

  • I did not ask EF to include any of it beyond the first level (no .Include() anywhere)
  • I have no way to turn it off.
  • I cannot select specific properties of navigation properties since you can't .Select() on an object.

Please provide a way to do either of those, or maybe there's a workaround I do not know about ?

@fschlaef This doesn't look like the same scenario. Please file a new issue and include a small, runnable project/solution or complete code listing that demonstrates the behavior you are seeing.

Problem : Automatic fix-up does whatever it wants, at random, toss a coin and good luck !
Solution : Rely on external software to circumvent the issue that you cannot control or configure. Aka ReferenceLoopHandling.Ignore

.Net Core 3.0.0 now pushes System.Text.Json instead of Newtonsoft.Json (https://docs.microsoft.com/en-us/aspnet/core/migration/22-to-30)

Problem : Loop reference handling isn't implemented https://github.com/dotnet/corefx/issues/41002
Solution : Revert back to old solution instead of using the default, recommended solution.

Conclusion : enjoy your infinitely looping JSON because EF just decided yes, you absolutely needed _all_ the collections and navigation properties it could possibly find.

  • Why is this default ?
  • Why can't I turn it off ?
  • Why is this default ?
  • Why can't I turn it off ?

Same problem here. Neither do I like nor want automatic NavProperty fix up! The NavProperties I'd like to use are set up myself - and guess anyone does. I'd even say the default behavior should be disabled!

I think the solution of removing the collection from the navigation property is not always working, what if the entity reference itself?
I think we need some configuration setting to load only navigation properties that we define in the "Include" ,the "include" feature is losing its use because of this automatic fix-up

I created a repro to demonstrate why this "feature" is bad for reliability and consistency :

https://github.com/fschlaef/AutomaticFixUpIsBad

Will probably do one for looping JSON too.

Would it be possible to add something like an Ignore extension method? This way, you could do something like:

db.Items
    .Include(item => item.Category)
    .Ignore(category => category.Items)
// additional query

We have run into trouble with the automatic fixup, too. And have spent quite some time to understand what is going on. The current behaviour of EF is not very intuitive, when it comes to fix-up of navigation properties.

I would like a consistent behaviour, that does not not depend on wether the context already knows an entity or not.

Please add a switch that lets me switch off fix-up of nav-properties that are not covered by Include()

Part of the issue is what when defining relationships, often the navigation property is part of the definition, and has to stay in the POCO class. Sometimes I don't want the navigation property in my class; just the foreign key is all I want. And with navigation properties, we often get those dreaded circular references. However, you still need access to the navigation property for the definition in the modelbuilder. For example, a couple examples:

Here is an example from https://entityframeworkcore.com/knowledge-base/38381829/withrequired---no-extension-method-defined where it uses Account to help define the model, but what if I don't want Account in my User class at all??

modelBuilder.Entity<User>()
    .HasOne(e => e.Account)
    .WithOne(e => e.User)
    .HasForeignKey<Account>(e => e.AccountId);

Good news though -- I just discovered that you can define these relationships without having to have the navigation property in your model -- instead of .HasOne(e => e.Account), just use the typed version of HasOne, such as .HasOne. Now you can remove the Account navigation property completely out of the User class.

Just tested on my code and it works. So you can reference the type of your foreign keys (by type), instead of by property, and then just remove the navigation properties. Another example which works in a one to many relationship is changing:

        modelBuilder.Entity<Post>()
            .HasOne(p => p.Blog)
            .WithMany(b => b.Posts)
            .HasForeignKey(p => p.BlogForeignKey);

to

        modelBuilder.Entity<Post>()
            .HasOne<Blog>()
            .WithMany(b => b.Posts)
            .HasForeignKey(p => p.BlogForeignKey);

Now Blog no longer has to be part of the Post class, but the relationship is still defined with the BlogForeignKey still present, but the Blog navigation class is "bye-bye' :) I hope this helps some people out there.

@ajcvickers & @AndriySvyryd

I'd like to address the question of "Why is this an urgent issue to resolve?" which from my reading across multiple, similar, threads seems to be unanswered to date.

This default behavior is causing very large object graphs to be created. Given the size of the object graph, consequently, these objects are being created not in the ephemeral heap, but instead in the LOH. This forces Gen 2 GCs, which are very slow, for what amounts to be for _absolutely no reason_ other than "it's default behavior".

This issue remaining in EF Core most definitely causes significant performance degradation within production environments when the Gen 2 GC is triggered, which is the most expensive generation to run. Because these objects land in the LOH, they are forcing Gen 2 GCs, so that's unavoidable.

This bug is assuredly causing multi-second pauses within production as a consequence. Except, these issues are not caused immediately when the objects are created. They are delayed until the Gen 2 GC runs based on GC budgeting.

In the case of ASP.NET Core, issues like this will cause ASP.NET itself (when using EF Core) to have a systemic increase in TTFB because the Gen2 GCs will increase the average TTFB.

Please see the LOH screenshot from within PerfView, attached, of my web server creating one such object.

RelationshipFixupCausesLOHAndGen2GCs

I have also included SQL Server client statistics for the same query, to show that the memory allocation is not caused by SQL data size.

RelationshipFixupCausesLOHAndGen2GCs-ClientStatistics

Can we get a timeline for the fix for this urgent production-impacting performance issue?

Kind regards,
Jeff Fischer

@windhandel This feature won't help your scenario. Whether something is allocated on the LOH depends on the size of the object itself, it doesn't take into account what that object is referencing. As your screenshot shows in your case it's the byte arrays, not the entities containing them that are in the LOH.

@AndriySvyryd I'm not that familiar with the intricacies of the query execution and at what stage the fixup happens.

I hear you identifying a second issue that should be addressed. Is this a separate performance issue to be addressed?

@AndriySvyryd I think you're implying a related question of "Can you account for the difference in the bytes coming back in the SQL Statement and the size of the byte array being read?"

To answer that question - no. I'm unsure where that byte difference is coming from. I would have assumed the byte array to be 1.1M at the most.

Of course, as prescribed by the book Writing High Performance .NET Code, I still wouldn't suggest that byte array be over 85k and go into the LOH. Anything going into the LOH should live for the duration of the app domain, i.e. be pooled.

That doesn't say what the solution should be, but pushing that byte array into the LOH temporally definitely isn't it.

@windhandel Can you create a very simple runnable project that shows your scenario and file a separate issue? If possible also try executing the same query using ADO.NET and compare the allocations.

@AndriySvyryd are you having difficulty reproing it? Do you need my help?

@windhandel Yes

That'll take me about 2 hrs to do I figure. I'm ready when you are.

https://paypal.me/EFCoreYay/300

@AndriySvyryd Allow me to quote myself from almost a year ago :

  • Why is this default ?
  • Why can't I turn it off ?

Please simply answer these two questions. 0 lines of code to write, 0 tests, 0 PR.

I am kind of tired of the 80 MB payloads routinely served to me by EF each time I forget to re-instanciate EVERYTHING to only select what I care about instead of being able to select navigation properties directly.

Anything like .NoAutomaticFixUp(), even on a per-request basis will do. Not every query pulls huge amounts of data so it's fine.
Litteraly anything will do other than "it's default behavior because someone decided so in 2008"

Even an explanation on WHY it's there and WHY we can't turn it off will do actually. Silence and ignoring issues won't do.

Why is this default ?

This is what most users expect to happen, especially when migrating from EF6.

Why can't I turn it off ?

Because we haven't designed, implemented and tested API to do so. See our planning process for more details.

  • Why can't I turn it off ?

These two seem overlooked:

  1. Our team loves Gen 2 GCs
  2. The longer the TTFB, the better

;-)

@AndriySvyryd I wouldn't be approaching this issue with a bit of jest if this issue hasn't existed for several years now and been repeatedly reported and ignored. Hiding behind the "we have a process" doesn't win points with customers, it turns them off.

I would ask: Why isn't your existing performance testing process picking up these obvious GC issues?

Given how hyper-critical EF Core is to many, many products, it would seem that your team needs to revisit your performance testing approach.

I suspect, a few pointed integration test assertions _across your entire test suite_ will light up your test results like a Christmas tree.

As @fschlaef has mentioned, this is not a difficult repro.

This is a rudimentary repro that requires specific performance-centric assertions. Adding those assertions will cause large segments of your test suite to fail.

As an ex-MFST FTE, if you need help getting this performance-centric test process in place to improve the performance lifecycle of the product, let me know and I would be happy to aid in that process. My business, Software Performance Ensurance, is designed to help teams with just that.

Kind regards,
Jeff

@windhandel I understand your frustration, but the fundamental point that may being missed here is that implementing this feature will not result in the query returning more or less data. The amount of data returned will be the same. The only thing that would change is whether or not this data is associated through references or collections from one entity instance to another.

@AndriySvyryd Thank you for answering. Regarding people migrating from EF6, do you have a percentage of affected users ? Nuget says 3.1.0 was downloaded 4+ million times, does "legacy" users represent a big part of that ?

@ajcvickers We're getting somewhere now ! I'm not sure I understand the response though, if data is not associated therefore it is not returned therefore the amount of data goes down ? I might miss something here but this doesn't make a lot of sense.

Also :

whether or not this data is associated through references or collections from one entity instance to another

Seems like this is precisely what we don't want to happen ? I just don't want the server to send unwanted data.

@fschlaef See this comment and the additional discussion around it.

@ajcvickers personally, I'm less frustrated than I am amused. It appears that customer debate over the expected behavior is more the goal than customer (developer) expectations and/or satisfaction, based on the duration, depth and repetitiveness of these conversations over the years.

One can't help but wonder whether the cumulative time spent debating developers on the behavior could have resolved this issue 2 or 3 times over at this point.

I've recognized that the industry, as a whole, has a long way to go when it comes to properly asserting against a variety of these types of qualitative measures. We've all a long way to go. Thus, me creating my business to aid in this process.

I see what you were saying when you pointed to references within the object model. You're saying that Gen 2 GCs are not caused by the fixup behavior, that's a separate issue that should be addressed, as @AndriySvyryd mentioned.

RelationshipFixupSourceOfAllocations

End of last year I worked with the .NET Core team to resolve LOH allocations within String.Split. This issue was quickly addressed once it was determined the severity of the issue.

Operations such as this causing multiple LOH allocations are high severity. Are you saying that EF Core causing LOH allocations are of no consequence?

Again: Why isn't your existing performance testing process picking up these obvious GC issues?

@windhandel I created an issue for your case at https://github.com/dotnet/efcore/issues/21766, please refrain from further comments on the current issue as it's not related.

@windhandel I understand your frustration, but the fundamental point that may being missed here is that implementing this feature will not result in the query returning more or less data. The amount of data returned will be the same. The only thing that would change is whether or not this data is associated through references or collections from one entity instance to another.

This is true until you need to serialize into JSON and end up with a massive object

@argoff and others, are you basically referring to the lack of circular reference handing in System.Text.Json, which leads to needlessly large/duplicated JSON payloads ? If so, circular reference handling is being added for 5.0 - does that address your concern here?

It's worth mentioning that Newtonsoft.Json has also had support for this for a long time.

@argoff and others, are you basically referring to the lack of circular reference handing in System.Text.Json, which leads to needlessly large/duplicated _JSON_ payloads ? If so, circular reference handling is being added for 5.0 - does that address your concern here?

It's worth mentioning that Newtonsoft.Json has also had support for this for a long time.

No, I am currently using Newtonsoft.Json and have it set to ReferenceLoopHandling.Ignore, this does catch a large chunk of them but not all.

In my case I have a SupportTicket object which has a Contact nav on it, this Contact also has a list of support tickets on it as we need to be able to navigate both ways in different situations. If the front end asks for SupportTickets including the Contact nav, then entity framework also populates SupportTickets.Contact.SupportTickets, Newtonsoft doesn't remove this when serailizing, I believe because SupportTickets[1].Contact.SupportTickets is a filtered version of SupportTickets for example.

As @ajcvickers wrote above, EF navigation fix-up doesn't load any additional data from the database beyond what is normally loaded by the query (or already tracked by the context).

I've written a quick sample program below to approximate your SupportTickets/Contact scenario, and to demonstrate how it works. In the sample, a single Contact has two SupportTickets. The query selects SupportTicket with ID 1, including its Contact. In the results, the Contact's SupportTickets navigation is fixed up to contain the SupportTicket with ID1, but does not contain the other SupportTicket, because it isn't otherwise loaded by the query.

To make sure we understand exactly which issue people are running into, I'd recommend tweaking the code sample below to demonstrate any problems you have - and to leave any interaction with JSON serialization out of the picture, for now.


Sample program

```c#
class Program
{
static async Task Main(string[] args)
{
await using (var ctx = new BlogContext())
{
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();
}

    await using (var ctx = new BlogContext())
    {
        Console.WriteLine("First pass");

        var someTicket = ctx.Tickets
            .Include(t => t.Contact)
            .Single(t => t.Id == 1); // EF loads only the ticket with id=1

        // Fix-up occurs only for entities which are *otherwise* loaded by the query, so the following
        // only prints "Ticket: 1"
        foreach (var ticket in someTicket.Contact.Tickets)
            Console.WriteLine($"Ticket: {ticket.Id}");
    }

    await using (var ctx = new BlogContext())
    {
        Console.WriteLine("Second pass");

        var someTicket = ctx.Tickets
            .Include(t => t.Contact)
            .AsEnumerable() // Client evaluation from here, so EF loads *all* tickets
            .Single(t => t.Id == 1);

        // Since the query above loads all tickets (no WHERE in SQL), EF does fix-up for both tickets of the
        // contact, and the following prints for both.
        foreach (var ticket in someTicket.Contact.Tickets)
            Console.WriteLine($"Ticket: {ticket.Id}");
    }
}

}

public class BlogContext : DbContext
{
public DbSet Tickets { get; set; }
public DbSet Contacts { get; set; }

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    => optionsBuilder.UseSqlServer("...");

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Contact>().HasData(new Contact { Id = 1 });

    modelBuilder.Entity<SupportTicket>().HasData(
        new SupportTicket { Id = 1, ContactId = 1 },
        new SupportTicket { Id = 2, ContactId = 1 });
}

}

public class SupportTicket
{
public int Id { get; set; }
public int ContactId { get; set; }
public Contact Contact { get; set; }
}

public class Contact
{
public int Id { get; set; }
public List Tickets { get; set; }
}
```

@roji Cheers for this, I believe I have found the issue, I know you said to leave serialization out but that is part of where the issue is occurring. If we change main so that it loads all tickets and serializes, all is good and newtonsoft removes the references.

```c#
static async Task Main(string[] args)
{
await using (var ctx = new BlogContext())
{
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();
}

        await using (var ctx = new BlogContext())
        {
            Console.WriteLine("First pass");

            var someTickets = ctx.Tickets
                .Include(t => t.Contact);

            var json = JsonConvert.SerializeObject(someTickets, new JsonSerializerSettings() {ReferenceLoopHandling = ReferenceLoopHandling.Ignore });

            Console.WriteLine(json.ToString());
        }
    }
But if we add a .ToList() when getting the data newtonsoft doesn't remove them:

```c#
static async Task Main(string[] args)
        {
            await using (var ctx = new BlogContext())
            {
                await ctx.Database.EnsureDeletedAsync();
                await ctx.Database.EnsureCreatedAsync();
            }

            await using (var ctx = new BlogContext())
            {
                Console.WriteLine("First pass");

                var someTickets = ctx.Tickets
                    .Include(t => t.Contact)
                    .ToList();

                var json = JsonConvert.SerializeObject(someTickets, new JsonSerializerSettings() {ReferenceLoopHandling = ReferenceLoopHandling.Ignore });

                Console.WriteLine(json.ToString());
            }
        }

@argoff that seems to be a difference in how Json.NET handles reference loops for IEnumerable vs List - from EF Core's perspective the two are completely identical, and will result in the same object graph. You should be able to reproduce the same different in Json.NET behavior without EF Core.

As far as I can see looking through this issue, I still can't see any clear indication that EF Core is doing anything to increase or duplicate data, or that there may be a reason to disable fix-up of navigation properties. If anyone still thinks there's an EF issue, can you please tweak the code sample in https://github.com/dotnet/efcore/issues/11564#issuecomment-664213251 or post an alternate minimal sample?

@roji : The problem lies within this sentence :

As ajcvickers wrote above, EF navigation fix-up doesn't load any additional data from the database beyond what is normally loaded by the query (_or already tracked by the context_).

We do not want everything tracked by the context to be automatically included if not explicitely requested.
That is the whole point, the entire issue. JSON serialization doesn't matter.

When I write a query I only want what I include. I don't want EF to dump massive collections or sub-sub-sub-navigation properties in my objects unless I .Include() them explicitely

@fschlaef "Already tracked by the context" refers to entity instances that were loaded by previous queries on the same context instance. This again doesn't bring in any extra data from the database that isn't already present in memory in your context. There is no "dumping" of any kind happening here: EF will merely make the navigation properties reference objects that are already tracked and in memory. What exact issue do you see with that?

I really am trying trying to understand what is being asked here and why - and to dispel any confusion about what EF actually does. A simple code sample illustrating the behavior you consider problematic would go a long way to help me understand.

@roji here is a basic demonstration. Say I have the following two entities:

Item.cs

public class Item
{
    public int Id { get; set; }
    public int CategoryId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }

    public Category Category { get; set; }
}

Category.cs

public class Category
{
    public int Id { get; set; }
    public string Name { get; set; }

    public virtual ICollection<Item> Items { get; set; }
}

If I write out the following query method:

public static async Task<List<Item>> GetItems(this AppDbContext db) =>
    await db.Items
        .Include(x => x.Category)
        .ToListAsync();

and hook it up to an Action Method:

[HttpGet("[action]")]
public async Task<List<Item>> GetItems() => await this.db.GetItems();

I end up with the following JSON:

Only initial item shown for brevity

image

In the query method, I only asked for the Category, I did not specify .ThenInclude(x => x.Items) following .Include(x => x.Category). I'm getting data that I did not ask for, but EF has it loaded because QueryTracking is enabled by default.

If I disable query tracking, I end up with my desired behavior:

services.AddDbContext<AppDbContext>(options =>
{
    options.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);
});

image

This has actually turned out to be the best case for me when working with Entity Framework Core and Web API.

See #13577 for my initial interaction with this issue.

@fschlaef "Already tracked by the context" refers to entity instances that were loaded by previous queries on the same context instance. This again doesn't bring in any extra data from the database that isn't already present in memory in your context. There is no "dumping" of any kind happening here: EF will merely make the navigation properties reference objects that are already tracked and in memory. What exact issue do you see with that?

This :

EF will merely make the navigation properties reference objects that are already tracked and in memory.

I didn't ask for those navigation properties, yet here they are. I don't want it. Call it consistency or idempotence (https://en.wikipedia.org/wiki/Idempotence#Computer_science_examples) it's the same.

Bottom line is you cannot SELECT navigation properties directly because the result will be dependant on prior actions on those collections / objects.

I understand it is by design as it was explained in an older comment, but I would like the possibility to not do auto fix-up.

@roji I previously did .NET performance analysis for one of Azure's largest clients, a cloud software company. Have you considered the impact this issue will have on memory dump size?

It was a very time consuming part of my job to consistently retrieve .etl files and dump files from Azure production including the movement into blob storage from another team and retrieval from blob storage by myself and others. I worked across many clients simultaneously and that was an everyday aspect of my job.

I suspect this issue will add a great deal of time to that process, aside from the "aesthetics" of the issue of just having a larger object model in-memory.

How much larger are these memory dumps?

@JaimeStill

In the query method, I only asked for the Category, I did not specify .ThenInclude(x => x.Items) following .Include(x => x.Category). I'm getting data that I did not ask for, but EF has it loaded because QueryTracking is enabled by default.

On the EF side, the reason for this behavior is that your query already loads all items (that's the db.Items part in db.Items.Include(x => x.Category).ToListAsync()). The point I'm trying to make, is that EF is not loading any additional data from your database because of these navigation options - it's only wiring data that is loaded anyway.

Now, I do understand that you don't want to return the items under categories in your JSON document - that makes sense indeed. Assuming you're simply serializing the entire result graph returned by the query above, then setting ReferenceLoopHandling.Ignore should make those internal item entries disappear, since in the graph returned by EF the category items are simply references to the outer items (in other words, they're loops).

It's also perfectly reasonable to do as you've done, and disable tracking (either context-wide as you've done, or for specific queries via AsNoTracking). But note that this may have other, unintended consequences; for example, if you execute multiple queries via the same EF context, disabling tracking would return new entities instances for database rows which have already been loaded, etc.

@fschlaef

I didn't ask for those navigation properties, yet here they are. I don't want it. Call it consistency or idempotence (https://en.wikipedia.org/wiki/Idempotence#Computer_science_examples) it's the same.

The navigation properties are there because they've been defined in the model - it's perfectly fine to remove them, in which case no fix-up will happen. In addition, the fix-up behavior is part of tracking query behavior; as with @JaimeStill above, it's already possible to opt out of this by switching to non-tracking queries...

@windhandel

How much larger are these memory dumps?

Please carefully read the messages posted above, and once again, the code sample I've posted above. As far as I know, nothing in this conversation has any bearings on memory usage, LOH or otherwise - EF does not load any additional data that isn't loaded anyway (or is already tracked and in memory because of a previous tracking query). In your case, you probably need to find out exactly which query loaded your large data (i.e. byte arrays). As @AndriySvyryd suggested, let's keep the discussion around the Large Object Heap in #21766.

To everyone reading this, please do your best to understand exactly what EF does, and what it does not do. I strongly recommend taking a look at the code sample above, and tweaking it to understand the behavior. You're also welcome to tweak that in order to demonstrate a scenario which you believe is problematic.

@roji

Now, I do understand that you don't want to return the items under categories in your JSON document - that makes sense indeed. Assuming you're simply serializing the entire result graph returned by the query above, then setting ReferenceLoopHandling.Ignore should make those internal item entries disappear, since in the graph returned by EF the category items are simply references to the outer items (in other words, they're loops).

In the demonstration I showed above, ReferenceLoopHandling.Ignore is already set. The only thing that removes is the root item in the sub collection (In this case, note that Nirvana - Nevermind is not included in the Item.Category.Items array).

I bring up setting QueryTrackingBehavior.NoTracking because it legitimately fixes the issue for me, as well as several other issues that I was having using EF Core in conjunction with Web API. This may be a viable approach for others dealing with a similar issue. I appreciate your time and consideration, and just wanted to share my experience in the matter.

@JaimeStill if your JSON returns only one item (Nirvana - Nevermind), as opposed to all items (as seems to be suggested by the GetItems method), then yeah, that makes sense - the only loop is with that specific item. If your web service really does return only one item, you may want to refactor your code to make EF not load all items - though I'm lacking a lot of context about your application to know what's going on. In any case, non-tracking queries are indeed also a solution - thanks for sharing that.

@JaimeStill if your JSON returns only one item (Nirvana - Nevermind), as opposed to all items (as seems to be suggested by the GetItems method)

It returns all items, I only included a screenshot of the first item to illustrate what is happening. If you were to look at the entry for Jimi Hendrix - Electric Ladyland, the array of items contained by it's Category navigation property would contain every item it's related to except for Jimi Hendrix - Electric Ladyland (which indicates ReferenceLoopHandling.Ignore is working).

Here are the results of the same query, with ReferenceLoopHandling.Ignore set, the only difference being the setting for QueryTrackingBehavior.

@JaimeStill makes sense, now I understand your example more fully - thanks.

@roji, maybe I was unclear in my question. It was on-topic as I previously saw @AndriySvyryd 's request and your sample.

If you don't want me to contribute to the discussion any longer, that's fine, just give me the word.

To clarify: Since a memory dump not only has to dump the contents of the memory, but also the instance-relationships of the, now circular dependency object graph, how does the memory dump handle dumping such a circular object graph? Does it have a built-in mechanism for "trimming" the circular references? I suppose it must for such an issue.

i.e. Will the circular dependency object graph being dumped be large?

@windhandel I definitely don't want you to feel you shouldn't contribute to the discussion - I'm only trying to make sure this (already complex) discussion stays focused, and that we don't talk about multiple, unrelated things at once.

In general, memory structures with circular references (such as what EF tracking queries sometimes produce) are a completely normal thing and shouldn't necessarily be avoided as such. All memory dump tools know how to handle these, and generally dump objects by their memory address; so if you have 10 objects holding pointers to the same memory area, your dump file will only contain one copy of that memory area.

So to answer your question: a circular dependency object graph, dumped by a memory profiler/tool (e.g. perfview) will not be large. The best way to help you understand what's going on in your particular case, is for you to come up with a small console program that creates a situation which you consider abnormal, and post that so that it can be investigated - but that's better done in #21766.

Was this page helpful?
0 / 5 - 0 ratings