Efcore: Help needed: deadlock occurring in functional tests

Created on 28 Apr 2018  路  10Comments  路  Source: dotnet/efcore

I hope the EF Core team can help out with this issue, it may actually affect non-Npgsql usage as well.

At some point during the 2.1.0 development, Npgsql's functional test suite started hanging when executing on Appveyor (but runs fine locally). Unfortunately I don't know exactly when this started (the state of the build was regardless broken for a while).

Using process explorer, I was able to get a stack trace of the two blocking threads during the test run (Appveyor has two virtual CPUs, so xunit automatically uses that many parallel threads). I've cut out some irrelevant native parts but otherwise here they are:

Thread 1:

mscorlib.dll!System.Threading.ManualResetEventSlim.Wait+0x2e3
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait+0xb6
mscorlib.dll!System.Threading.Tasks.Task.InternalWait+0x1a1
mscorlib.dll!System.Threading.Tasks.Task`1.GetResultCore+0x26
System.Interactive.Async.dll!<ToEnumerable_>d__442`1.MoveNext+0xd7
System.Core.dll!System.Linq.Buffer`1..ctor+0xa2
System.Core.dll!<GetEnumerator>d__1.MoveNext+0xd8
mscorlib.dll!System.Collections.Generic.List`1..ctor+0x1b4
System.Core.dll!System.Linq.Enumerable.ToList+0x46
Microsoft.EntityFrameworkCore.Specification.Tests.dll!<>c__DisplayClass63_0`1.<CollectionAsserter>b__0+0x31b
System.Core.dll!System.Dynamic.UpdateDelegates.UpdateAndExecuteVoid3+0x2c4
[Unmanaged to Managed Transition]
clr.dll+0x1f3c
[Managed to Unmanaged Transition]
Microsoft.EntityFrameworkCore.Specification.Tests.dll!<>c.<Correlated_collections_on_select_many>b__60_1+0x1143
Microsoft.EntityFrameworkCore.Specification.Tests.dll!Microsoft.EntityFrameworkCore.TestUtilities.TestHelpers.AssertResults+0x318
Microsoft.EntityFrameworkCore.Specification.Tests.dll!<AssertQuery>d__20`2.MoveNext+0x4f6
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!MoveNextRunner.Run+0x6c
xunit.execution.desktop.dll!<>c__DisplayClass7_0.<Post>b__1+0x29
xunit.execution.desktop.dll!Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext+0x2b
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!System.Threading.ExecutionContext.Run+0x52
xunit.execution.desktop.dll!Xunit.Sdk.ExecutionContextHelper.Run+0x7f
xunit.execution.desktop.dll!Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc+0xe7
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!System.Threading.ExecutionContext.Run+0x52
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart+0x5c

Thread 2:

mscorlib.dll!System.Threading.ManualResetEventSlim.Wait+0x2e3
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait+0xb6
mscorlib.dll!System.Threading.Tasks.Task.InternalWait+0x1a1
mscorlib.dll!System.Threading.Tasks.Task`1.GetResultCore+0x26
System.Interactive.Async.dll!<ToEnumerable_>d__442`1.MoveNext+0xd7
System.Core.dll!System.Linq.Enumerable.Count+0xd5
Microsoft.EntityFrameworkCore.Specification.Tests.dll!<>c__61`1.<CollectionSorter>b__61_0+0x199
System.Core.dll!System.Linq.EnumerableSorter`2.ComputeKeys+0x85
System.Core.dll!System.Linq.EnumerableSorter`1.Sort+0x1d
System.Core.dll!<GetEnumerator>d__1.MoveNext+0x118
mscorlib.dll!System.Collections.Generic.List`1..ctor+0x1b4
System.Core.dll!System.Linq.Enumerable.ToList+0x46
Microsoft.EntityFrameworkCore.Specification.Tests.dll!Microsoft.EntityFrameworkCore.TestUtilities.TestHelpers.AssertResults+0x2ae
Microsoft.EntityFrameworkCore.Specification.Tests.dll!<AssertQuery>d__20`2.MoveNext+0x4f6
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!MoveNextRunner.Run+0x6c
xunit.execution.desktop.dll!<>c__DisplayClass7_0.<Post>b__1+0x29
xunit.execution.desktop.dll!Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext+0x2b
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!System.Threading.ExecutionContext.Run+0x52
xunit.execution.desktop.dll!Xunit.Sdk.ExecutionContextHelper.Run+0x7f
xunit.execution.desktop.dll!Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc+0xe7
mscorlib.dll!System.Threading.ExecutionContext.RunInternal+0x160
mscorlib.dll!System.Threading.ExecutionContext.Run+0x14
mscorlib.dll!System.Threading.ExecutionContext.Run+0x52
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart+0x5c

As you can see, both threads are synchronously blocking a task, somewhere within System.Interactive.Async.

Disabling parallelization entirely makes the deadlock go away, and manually setting MaxParallelThreads to 4 instead of 2 seems to do the same. At this point it feels to me like there's some sort of sync-over-async deadlock, where the 2 threads are synchronously blocked, possibly waiting for asynchronous operations to complete, but no thread is available to execute those async callbacks. When parallelization is active, xunit sets up a SynchronizationContext which enforces this (I've already complained once in https://github.com/xunit/xunit/issues/864). I don't really have any proof that this is what's happening, but since I'm not very familiar with System.Interactive.Async I thought I'd ask for help on this.

If something like this is indeed happening, it could be affecting EF Core in general - probably only the test suites - although I'm not sure why you guys haven't bumped into this yourselves.

Any help would be greatly appreciated, in the meantime I'll just crank up the number of threads or disable parallelization altogether.

closed-not-needed punted-for-2.2 punted-for-3.0 type-bug

All 10 comments

More more piece of information... Running the xunit console runner with -verbose shows a list of tests as they are run, here are the tests leading up to the deadlock:

    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_one_to_one_reference_to_dependent_untyped(state: Unchanged, async: False)
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_one_to_one_reference_to_dependent_untyped(state: Modified, async: True)
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGroupByQueryNpgsqlTest.GroupBy_join_anonymous
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncIncludeNpgsqlTest.Include_collection_on_join_clause_with_filter
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_one_to_one_reference_to_dependent_untyped(state: Modified, async: False)
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_one_to_one_reference_to_dependent_untyped(state: Deleted, async: True)
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGearsOfWarQuerySqlServerTest.Correlated_collections_basic_projecting_single_property
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.IncludeAsyncNpgsqlTest.Include_collection_on_additional_from_clause2
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncIncludeNpgsqlTest.Include_reference_alias_generation
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGroupByQueryNpgsqlTest.SelectMany_GroupBy_Aggregate
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_one_to_one_reference_to_dependent_untyped(state: Deleted, async: False)
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_collection_using_Query_alternate_key(state: Unchanged, async: True)
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGearsOfWarQuerySqlServerTest.Correlated_collections_basic_projection_ordered
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGroupByQueryNpgsqlTest.GroupBy_Constant_Select_Sum_Min_Key_Max_Avg
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.AsyncGearsOfWarQuerySqlServerTest.Correlated_collections_on_select_many
    Npgsql.EntityFrameworkCore.PostgreSQL.Query.IncludeAsyncNpgsqlTest.Include_collection_as_no_tracking
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_collection_using_Query_alternate_key(state: Unchanged, async: False)
    Npgsql.EntityFrameworkCore.PostgreSQL.LoadNpgsqlTest.Load_collection_using_Query_alternate_key(state: Modified, async: True)

I don't know if the console runner prints tests before or after they are executed. The stack trace from process explorer does show Correlated_collections_on_select_many, unfortunately without showing the class name (i.e. is it the async version or not). But as the third line from the bottom shows AsyncGearsOfWarQuerySqlServerTest.Correlated_collections_on_select_many, we can probably assume that's the one. This also may give a clue as to the 2nd test participating in the deadlock (the name isn't given in the stack trace).

Hope this helps...

Related #8384

Possibly Correlated_collections_on_select_many, collection Asserter & Sorter itself are causing deadlock without using any other test.

@maumar to investigate if this is a test or product issue.

At first glance, the query seems correct. Here is the query plan:

(QueryContext queryContext) => IAsyncEnumerable<<>f__AnonymousType38<string, string, IEnumerable<Weapon>, IEnumerable<Gear>>> _InterceptExceptions(
    source: IAsyncEnumerable<<>f__AnonymousType38<string, string, IEnumerable<Weapon>, IEnumerable<Gear>>> _ShapedQuery(
        queryContext: queryContext, 
        shaperCommandContext: SelectExpression: 
            SELECT [g].[Nickname] AS [GearNickname], [s].[Name] AS [SquadName], [g].[FullName], [s].[Id]
            FROM [Gears] AS [g]
            CROSS JOIN [Squads] AS [s]
            WHERE [g].[Discriminator] IN (N'Officer', N'Gear') AND ([g].[HasSoulPatch] = 1)
            ORDER BY [GearNickname], [s].[Id] DESC, 
        shaper: TypedProjectionShaper<TypedCompositeShaper<ValueBufferShaper, ValueBuffer, ValueBufferShaper, ValueBuffer, TransparentIdentifier<ValueBuffer, ValueBuffer>>, TransparentIdentifier<ValueBuffer, ValueBuffer>, <>f__AnonymousType38<string, string, IEnumerable<Weapon>, IEnumerable<Gear>>>), 
    contextType: TestModels.GearsOfWarModel.GearsOfWarContext, 
    logger: DiagnosticsLogger<Query>, 
    queryContext: queryContext)

and contents of the Projection shaper:

(QueryContext queryContext | TransparentIdentifier<ValueBuffer, ValueBuffer> t0) => new <>f__AnonymousType38<string, string, IEnumerable<Weapon>, IEnumerable<Gear>>(
    string TryReadValue(t0.Outer, 0, Gear.Nickname), 
    string TryReadValue(t0.Inner, 1, Squad.Name), 
    (IEnumerable<Weapon>)EnumerableAdapter<Weapon> _ToEnumerable(IAsyncEnumerable<Weapon> _InjectParameters(
            queryContext: queryContext, 
            source: IAsyncEnumerable<Weapon> _ShapedQuery(
                queryContext: queryContext, 
                shaperCommandContext: SelectExpression: 
                    SELECT [w].[Id], [w].[AmmunitionType], [w].[IsAutomatic], [w].[Name], [w].[OwnerFullName], [w].[SynergyWithId]
                    FROM [Weapons] AS [w]
                    WHERE (([w].[IsAutomatic] = 1) OR (([w].[Name] <> N'foo') OR [w].[Name] IS NULL)) AND (@_outer_FullName = [w].[OwnerFullName]), 
                shaper: BufferedEntityShaper<Weapon>), 
            parameterNames: new string[]{ "_outer_FullName" }, 
            parameterValues: new object[]{ string TryReadValue(t0.Outer, 2, Gear.FullName) })), 
    (IEnumerable<Gear>)EnumerableAdapter<Gear> _ToEnumerable(IAsyncEnumerable<Gear> _InjectParameters(
            queryContext: queryContext, 
            source: IAsyncEnumerable<Gear> _ShapedQuery(
                queryContext: queryContext, 
                shaperCommandContext: SelectExpression: 
                    SELECT [m].[Nickname], [m].[SquadId], [m].[AssignedCityName], [m].[CityOrBirthName], [m].[Discriminator], [m].[FullName], [m].[HasSoulPatch], [m].[LeaderNickname], [m].[LeaderSquadId], [m].[Rank]
                    FROM [Gears] AS [m]
                    WHERE ([m].[Discriminator] IN (N'Officer', N'Gear') AND ([m].[HasSoulPatch] = 0)) AND (@_outer_Id = [m].[SquadId]), 
                shaper: BufferedEntityShaper<Gear>), 
            parameterNames: new string[]{ "_outer_Id" }, 
            parameterValues: new object[]{ (object)Nullable<int> TryReadValue(t0.Inner, 3, Squad.Id) }))

async calls are wrapped in EnumerableAdapter and correlated collection optimization is not applied (because it's disabled for cross join), so it's unlikely that we are hitting some new pattern related to that feature.

@anpete any ideas about this?

This in the plan == deadlock: _ToEnumerable

Looks like this is not a new issue, we just didn't have the coverage (test in question has been added in 2.1 timeframe). We can ad-hoc detecting non-async enumerations in our async queries - could maybe use ExpressionPrinter for that

Relieved to know that it's not a problem somewhere in Npgsql, I was really scared at some point that the lock-free connection pool was somehow responsible.

For sure, if you guys are calling ToEnumerable() (which blocks synchronously) over any sort of async I/O, that'll cause a deadlock in a constrained-thread SynchronizationContext (like xunit's).

Closing as no longer relevant due to query changes.

Was this page helpful?
0 / 5 - 0 ratings