Roslyn: Unit tests using CSharpGeneratorDriver cause test hangs in CI

Created on 3 Nov 2020 · 11Comments · Source: dotnet/roslyn

Version Used:

.NET 5.0 RC2
Roslyn packages version 3.8.0-4.final
Xunit 2.4.1

Steps to Reproduce:

Compare the two repro branches: https://github.com/kzu/avatar/compare/feature/hang...feature/nohang
See CI build for feature/nohang at https://github.com/kzu/avatar/runs/1347616845
See CI build for feature/hang at https://github.com/kzu/avatar/actions/runs/343636842

Interestingly, tests never hang on macOS, but they consistently hang on ubuntu and windows.

The code basically uses the following code to run source generators:

            ISourceGenerator generator = new AvatarSourceGenerator();
            var driver = CSharpGeneratorDriver.Create(generator);
            // Make sure we don't hang forever during compilation.
            var cts = new CancellationTokenSource(10000);

            driver.RunGeneratorsAndUpdateCompilation(compilation, out var output, out diagnostics, cts.Token);

Interestingly, the cancellation token with 10s auto-cancel doesn't kick on.
I found the workaround through an asp.net core issue. The key issue is that locally, things run just fine. I never get the hangs on windows while I always get it on CI. There must be something different in the local run vs CI run (even though I use the exact same command line argument,

Additionally, hangs started happening after I moved some tests to use a theory instead. You can see that after that commit all subsequent builds failed: https://github.com/kzu/avatar/commits/feature/hang

Expected Behavior:

No hangs

Actual Behavior:

Hangs, consistently.

Suboptimal Workaround:

Add this assembly attribute to the test project:

[assembly: Xunit.CollectionBehavior(MaxParallelThreads = -1)]

Note that the value -1 for max paralell threads isn't even a documented valid value in xunit (that I could find), but setting it to 1 (supposedly the right way to force single-threaded execution) instead brings back the hangs 😕.

Area-Compilers untriaged

Source

kzu

Most helpful comment

https://xunit.net/docs/configuration-files#maxParallelThreads:

Set this to override the maximum number of threads to be used when parallelizing tests within this assembly. Use a value of 0 to indicate that you would like the default behavior; use a value of -1 to indicate that you do not wish to limit the number of threads used for parallelization.

_JSON schema type: integer_
_Default value: the number of logical processors in your PC_

By setting it to -1, you are saying "run as many Tasks as you want without limit".

The fact that 1 exacerbates your problem tells me you almost certainly have an async deadlock happening because of the use of unsafe Task calls like .Result, .GetAwaiter().GetResult(), Task.WaitAny, Task.WaitAll, etc.

bradwilson on 4 Nov 2020

❤2

All 11 comments

https://xunit.net/docs/configuration-files#maxParallelThreads:

Set this to override the maximum number of threads to be used when parallelizing tests within this assembly. Use a value of 0 to indicate that you would like the default behavior; use a value of -1 to indicate that you do not wish to limit the number of threads used for parallelization.

_JSON schema type: integer_
_Default value: the number of logical processors in your PC_

By setting it to -1, you are saying "run as many Tasks as you want without limit".

bradwilson on 4 Nov 2020

❤2

Yep, the fact that the source generator driver generates code and updates the compilation, apparently in a synchronous fashion looked fishy from the beginning. As a consumer of everything internal it hides behind that class, there isn't much I can do :(. Hopefully this can serve as a repro for the team to look into it more deeply.

Do you have any gut feeling as to why this might happen consistently on CI but consistently not happen on local runs? That was quite puzzling...

kzu on 4 Nov 2020

Do you have any gut feeling as to why this might happen consistently on CI but consistently not happen on local runs?

Fewer and/or slower cores in CI?

bradwilson on 4 Nov 2020

Makes sense... probably the hosted pool is just a bunch of VMs/docker containers sharing scarce resources with other concurrent builds :(

kzu on 4 Nov 2020

@bradwilson analysis matches my intuition here. I've hit this behavior several time inside the Roslyn test base when effectively spawned a number of threads greater than the max allowed threads and ended up calling Wait, Result.

@chsienki do u think an async version of this API is appropriate here? Most of the compilation driving APIs are synchronous but this is more in a hosting layer.

jaredpar on 6 Nov 2020

I haven't had a chance to dive into this fully yet, but I would point out that we have some known issues with the cancellation tokens, and we don't fully utilize it during generation: https://github.com/dotnet/roslyn/issues/46528.

@jaredpar My immediate gut is that it shouldn't be async. We're not doing any disk bound I/O; it should all be compute bound, so it wouldn't really gain us anything. Interestingly we're not parallelized yet inside the driver either, so I'm not sure how anything we're doing could be causing a deadlock.

chsienki on 6 Nov 2020

Interestingly we're not parallelized yet inside the driver either, so I'm not sure how anything we're doing could be causing a deadlock

xUnit manages the amount of threads available for the unit tests. Consider a situation where it's constrained to a single thread. If the driver, or any code it calls, spawns a thread and then does .Wait then we will deadlock in xUnit.

I'm fairly aware of this problem cause ... I'm a bad dev, and I've done this a lot in xUnit :smile:

If the generator driver isn't spawning threads then I don't think we should be doing async here. The repo though strongly suggests that something in this chain is spawning a thread and doing some form of synchronous wait.

One possible cause for the use of Parallel.For in the compiler. That is effectively doing a sync for / join operation with threads. If so and the unit test wants to consume us in an environment hard limited to one thread then parallelization should be disabled in the compilation options. Enabling parallelization in the compiler and keeping us to one thread is going to have issues.

jaredpar on 6 Nov 2020

To clarify a bit: xUnit.net does not limit the number of threads available in the system; our limitation is around the size of pool of threads available to run Tasks for tests. A test is free to spawn as many threads as it likes (and/or spawn new tasks outside the current sync context) and xUnit shouldn't care.

bradwilson on 6 Nov 2020

👍1

@bradwilson yeah my wording was a bit stronger than the actual model. Thanks for the clarification.

jaredpar on 6 Nov 2020

@bradwilson that thread pool size limitation includes the thread running the test correct?

jaredpar on 9 Nov 2020

Closing as this does not appear to be an issue with our code but more the environment that it is being hosted in.

jaredpar on 17 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings