Efcore: Open DataReader error on a relatively simple scenario, but only under load in production

Created on 19 Feb 2019 · 19Comments · Source: dotnet/efcore

My client is getting this fairly frequently in production, when there are multiple concurrent users, but we can't isolate and reproduce the problem in a dev environment. As far as we can tell the environments are identical (except for volume of requests). The app is an ASP.NET Core REST API using EF Core 2.2, and only two controller methods are showing the error:

...
There is already an open DataReader associated with this Command which must be closed first. at System.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
...

The context is being injected into the controller (Scoped). We have not (yet) tried to turn on MARS in production.

The simpler of the two methods goes something like this:

```C#
try {
var es = ctx.Database.CreateExecutionStrategy();
es.Execute(ctx => ectx =>
{
using (var txn = ectx.Database.BeginTransaction())
{
foreach (Thing t in ectx.Things.Where(criteria).ToList())
{
t.Prop = null;
log ();
}

  var t2 = ectx.Things.Include(t => t.OtherThing)
    .ThenInclude(o => o.AnotherThing)
    .ThenInclude(y => y.YetAnotherThing)
    .Include(t => t.OtherThing)
    .ThenInclude(o => o.SomeThing)
    .ThenInclude(s => s.FinalThing)
    .FirstOrDefault(criteria);
  if (t2 != null)
  {
    t2.Prop = "value";
  }

  log ();

  ectx.SaveChanges();
  txn.Commit();
}

}
}
catch (Exception ex) {
log ();
throw;
}
```

The logging that's happening above is to a table in the same database, but through a different context.

We're logging the generated SQL in our dev environments and it doesn't appear that any of the queries are attempting to lazy load or otherwise open a second DataReader.

I'm less familiar with EF (mostly used NHibernate), but a little bit of research suggests to me that the retry strategy and explicit transaction here is unnecessary (since we're only saving once at the end), but we left it in place in production until we can isolate and reproduce it (and better understand what's going on).

Any advice on what else we can try to reproduce this in isolation, or where else we might look to detect environmental differences that could be causing it? Is MARS basically mandatory for EF Core, and what kind of other issues might we expect to see if we enable it?

Further technical details

EF Core version: 2.2
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Windows Server
IDE: Visual Studio 2017

closed-could-not-reproduce customer-reported

Source

lesscodetxm

All 19 comments

@lesscodetxm It's hard to tell from the limited code posted, but it looks like two context instances are being used. That might be causing some issue, but without the ability to really see what is going on it's very hard to say. Could you create a small runnable project/solution that reproduces what you are seeing?

ajcvickers on 22 Feb 2019

@ajcvickers We've been unable to reproduce it in isolation. In a local dev env we've been able to stress test the two API calls exhibiting the DataReader error, and we've observed deadlocks, followed by retries, which then subsequently fail with errors resembling:

...
The instance of entity type 'x' cannot be tracked because another instance with the same key value for {'y', 'z'} is already being tracked.
...

but we're not sure if this is related. We've identified changes we can make to reduce or eliminate the contention causing the deadlocks, but it seems like the issue is related to the retry execution strategy when such a transient issue occurs.

lesscodetxm on 22 Feb 2019

EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it.

BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions.

ajcvickers on 1 Mar 2019

I was able to reproduce this (at least once) by taking the latest and adding the following test:

        [Fact]
        public virtual void Include_up_and_down()
        {
            int maxIterations = 100;
            int maxParallelQueries = 100;
            for (int i = 0; i < maxIterations; i++)
            {
                Parallel.For(
                    0, maxParallelQueries, (_, __) =>
                    {
                        using (var context = CreateContext())
                        {
                            var orders = context.Set<Order>().Include(o => o.Customer)
                                .Include(o => o.OrderDetails).ThenInclude(d => d.Product).ToList();

                        }
                    });
            }
        }

I've found that this code will either run successfully to completion (taking several minutes), or fail almost immediately with an "open DataReader" error. The query resembles those that occasionally fail in our system, in that they "Include" referenced entities in both directions (following both "parent" references and "child/detail" references, at multiple levels of detail), but I'm not sure if that's the only determining factor here.

lesscodetxm on 25 Apr 2019

@lesscodetxm Where are you adding that test?

ajcvickers on 25 Apr 2019

I threw it into testEFCore.Specification.TestsQueryIncludeTestBase.cs, and the override in testEFCore.SqlServer.FunctionalTestsQueryIncludeSqlServerTest.cs

lesscodetxm on 25 Apr 2019

👍1

@lesscodetxm Thanks.

ajcvickers on 25 Apr 2019

Here's the overridden test method:

        public override void Include_up_and_down()
        {
            base.Include_up_and_down();

            Xunit.Assert.True(true);
        }

lesscodetxm on 25 Apr 2019

I've found that this code will either run successfully to completion (taking several minutes), or fail almost immediately with an "open DataReader" error.

We randomly pick MultipleActiveResultSets true/false in our tests. Does it fail in SQLite in tests?

smitpatel on 25 Apr 2019

@smitpatel Ah, OK...

So I forced MARS to be false and the test seems to fail consistently.
It seems to succeed consistently with Sqlite.

lesscodetxm on 25 Apr 2019

That is expected. The TestBase uses DbConnection to configure SqlServer which is shared across all DbContext instances. If MARS is off then only one open DataReader can be present. When you run 1 query in 1 context, we will do buffer accordingly. But you are running them in parallel so even though you have created new DbContext for each thread, you are still using same connection which EF cannot know about and throws exception. That test failing is by design.

smitpatel on 25 Apr 2019

👍1

So can you advise on the easiest way to craft a similar unit test that behaves like it would in our production ASP.Net application?

lesscodetxm on 25 Apr 2019

Best option would be to create self-containing repro rather than trying to put something in our testbase. Our tests have a very interesting infrastructure setup to reuse as many components as possible to run tests faster on SqlServer. Which brings limitation like above that you cannot run query in multiple threads. Jumping through those hoops trying to figure out limitations may become time expensive.

smitpatel on 25 Apr 2019

👍1

So, we've still been unable to reproduce this with a standalone console app, either with Northwind or our schema. Meanwhile in production under heavy load we were still seeing many open DataReader errors during a business day. Last week we gave up and enabled MARS. The problem appeared to go away initially, but then under heavy load we're now seeing a different problem (that we've never observed before) with the same queries as the culprits:

SqlException: New transaction is not allowed because there are other threads running in the session.>    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

Other failures of our business logic are also now occasionally being seen with these queries. It appears that in some cases one of the .Include clauses is silently not loading the related entities, causing failures when using that data later for lookup purposes. We are not lazy loading.

We're now looking to rewrite these queries so that they do not do multi-level .Includes nor multiple .Includes on the same child collections in the same query.

lesscodetxm on 2 May 2019

@lesscodetxm All the indications point to multiple concurrent threads accessing the same DbContext instance. A couple of things to try:

Check that every async query is awaited before another query is started.
Instrument the code (in whatever way makes sense for you application) to check that context instances aren't being shared inappropriately.

ajcvickers on 2 May 2019

EF Team Triage: Closing this issue as the requested additional details have not been provided and we have been unable to reproduce it.

ajcvickers on 10 May 2019

Update:
We are not using async for this API.
We added context hash code logging for the problem queries and it did not appear that the context is being reused across ASP.NET requests.

What we _have_ done is changed our complex join queries from this form:

var t2 = ectx.Things.Include(t => t.OtherThing)
        .ThenInclude(o => o.AnotherThing)
        .ThenInclude(y => y.YetAnotherThing)
        .Include(t => t.OtherThing)
        .ThenInclude(o => o.SomeThing)
        .ThenInclude(s => s.FinalThing)
        .FirstOrDefault(criteria);

to this form:

var t2 = ectx.Things.SingleOrDefault(criteria);
ectx.Entry(t2)
        .Collection(t => t.OtherThing)
        .Query()
        .Include(o => o.AnotherThing).ThenInclude(y => y.YetAnotherThing)
        .Include(o => o.SomeThing).ThenInclude(s => s.FinalThing)
        .ToList();

It's been a few days since we made this change and we haven't seen the concurrency errors so far.

lesscodetxm on 10 May 2019

Update: Going on more than a week now and the changes to our join queries have not yielded any more open data reader errors. Even though we're unable to reproduce the problem in isolation, I still think there's something going on with the canonical form of the query under heavy load.

lesscodetxm on 17 May 2019

@lesscodetxm Thanks for the additional information. I don't think there is currently anything actionable here for us, so I'll leave this issue closed. Also, there are extensive changes to query architecture for 3.0, so it may be that the underlying cause becomes irrelevant anyway.
/cc @smitpatel

ajcvickers on 17 May 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to Implement Conditionally Required Field Validation

spottedmahn · 3Comments

How to disable tracking and validation?

ghost · 3Comments

Query: FromSql not composing with LINQ when it contains certain whitespace patterns

ra0o0f · 3Comments

In Entity Framework Core 1.0.0-RC2-Final, there seems to be no way to get the underlying DBTransaction after invoking Db.Database.BeingTransaction()

pachecosoftware · 3Comments

Entity Framework Core 2 - Code First not creating tables automatically

mohsin91 · 3Comments