Vstest: Show the test which crashed the test run (test explorer)

Created on 17 Mar 2017  路  34Comments  路  Source: microsoft/vstest

(Created from #623)

Description

Certain crashes make the test runner implode. I'm working on a refactoring and when I run tests sometimes this happens:

 The active test run was aborted. Reason: Unable to communicate with test host process.

Which probably means the process crashed but the test runner doesn't help me diagnose things here. I don't know which test was running when it crashed, so I have to manually binary search until I find the one that is the culprit.

blocking-partner enhancement ide question

Most helpful comment

This is fixed in latest dev builds of dotnet-cli 2.0.0-preview1-005963. Download it from https://dotnetcli.blob.core.windows.net/dotnet/Sdk/release/2.0.0/dotnet-dev-win-x64.latest.zip.

We have also fixed it for VS. Will update when a build with fix is available.

Note: issue will show for netcoreapp2.0 due to https://github.com/dotnet/cli/issues/6535.

All 34 comments

The test case start/end events are already available to data collector infrastructure. Could an inproc datacollector help with this scenario?

@codito No, we need to see the actual output of the error.
Consider the case that several tests together get into a state that crash the process.
We want to see the actual stack trace /error that caused it.
Right now the error is going into the void.

Is there any progress on this? This is one of the most annoying things about working with the new test runner.

Also not being able to see tests as they run

+1. Here's a simple test that is hard to debug because vstest swallows the error:
```c#
[Fact]
public void Fails()
{
Action fail = () => Debug.Assert(false, "Oops");
var thread = new Thread(new ThreadStart(fail));
thread.Start();
thread.Join();
}


$ dotnet test

Starting test execution, please wait...
[xUnit.net 00:00:00.4543872] Discovering: repro
[xUnit.net 00:00:00.5895954] Discovered: repro
[xUnit.net 00:00:00.6358530] Starting: repro
The active test run was aborted. Reason: Unable to communicate with test host process.

In contrast, using a console app or dotnet-xunit will correctly print to console the Debug.Assert failure.

$ dotnet xunit
Running .NET Core tests for framework netcoreapp2.0...
xUnit.net Console Runner (64-bit .NET Core 4.0.0.0)
Discovering: repro
Discovered: repro
Starting: repro

Unhandled Exception: System.Diagnostics.Debug+DebugAssertException: Oops

at System.Diagnostics.Debug.Assert(Boolean condition, String message, String detailMessage)
at repro.Test.<>c.b__0_0() in C:tmp2.0failreproTest.cs:line 13
at System.Threading.Thread.ThreadMain_ThreadStart()
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Diagnostics.Debug.ShowAssertDialog(String stackTrace, String message, String detailMessage)
at repro.Test.<>c.b__0_0() in C:tmp2.0failreproTest.cs:line 13
at System.Threading.Thread.ThreadMain_ThreadStart()
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
```

I am having this very issue. I have tests that run locally on my machine, but then crash on appveyor. It's in a test that has a large chain of code it runs, so now I am left with a human binary search to find the location of the bug.

https://ci.appveyor.com/project/Drawaes/leto/build/1.0.47

@Drawaes enabling diagnostics for test run may help: dotnet test -d:log.txt generates log.txt file, upload generated file as an artifact in CI, it may contain crash information. cc: @smadala

@Drawaes If you are using xunit you could use

  <ItemGroup>
    <DotNetCliToolReference Include="dotnet-xunit" Version="2.3.0-*" />
  </ItemGroup>

which is a new (beta) xunit runner which works as a dotnet cli tool. It doesn't accept a path (yet) though so you have to cd into your test project and run it from there.

Still no progress on this issue, it's been months now...

This is fixed in latest dev builds of dotnet-cli 2.0.0-preview1-005963. Download it from https://dotnetcli.blob.core.windows.net/dotnet/Sdk/release/2.0.0/dotnet-dev-win-x64.latest.zip.

We have also fixed it for VS. Will update when a build with fix is available.

Note: issue will show for netcoreapp2.0 due to https://github.com/dotnet/cli/issues/6535.

affected by this as well, please fix also for
dotnet --version
1.0.4
netcoreapp1.0

@codito , thanks for the update on this. Do you have an ETA for when the VS fix will be available?
The issue appeared out of nowhere for us over the weekend. running dotnet test works fine, but anything from within VS fails exactly as described here. Hence, got some folks waiting to get back in the saddle in VS rather than cmdline :)

@petarrepac sorry, there isn't a plan to back port the fix to dotnet-cli 1.0 at this time.

@Zimmergren don't have an exact date that I can share, a build should be available in next couple of weeks :)

Verified in VS15.3.0 preview1.0 [26526.3005.d15prerel].

Since #527 is closed as dupe of this,
Referenced issue https://github.com/aspnet/EntityFramework/issues/8642

@smadala - We are facing this issue in Ubuntu environment. So there is no VS. We are using dotnet SDK dotnet-sdk-2.0.0-preview2-006461-linux-x64.tar.gz to run tests.
The output is

[14:12:19]E:         [VSTest] Runner error: The active test run was aborted. Reason: 
null

What am I missing here?
(Side note: we are getting reason null instead of what is mentioned in very first post.)

Can you enable --diag:log.txt switch to get more logs and analyze log.txt and log.*.host.txt for more details?

Debugger.Break() in non debug run of tests does this.

@ElanHasson - I don't have Debugger.Break() in entire solution. Also the error is very flaky. It happens on CI servers but I haven't seen it happening locally.

I am testing locally with switch @smadala suggested. If I am lucky I will hit the error locally. Else I will try to hack it to run with switch on travis.

Here are the files from a failed test run that @smitpatel was talking about.
log.host.17-06-26_16-23-30_86155_4.txt
log.txt

@BrennanConroy Logs too don't have reason why testhost exited, Can you create and analyze(stacktrace) ubuntu dump?

@smadala is it possible that dotnet is not printing stacktrace to stderr? (we have seen one such case before: https://github.com/dotnet/cli/issues/6535)

Can we try a simple app that crashes with the dotnet-cli build used by @BrennanConroy? Will it help to try the inproc data collector to find failing test?

How do you propose I get a process dump if we only know about the error after the test process has completed.

@BrennanConroy Can you create stackoverflow in a test and check the stackoverflow messages appears on console?
Follow this create inproc data collector to figure out which test causing test abort.

I am expecting the stacktrace from dump, It may help us to find the code which is causing crash.

Creating a stackoverflow test logs the expected error. Addind the inproc data collector doesn't really help since tests are run in parallel so multiple ones are running when the tests fail.

All the processes are writing to same file. Change file name to inproctest_

This problem started for me when I inadvertently introduced an infinite recursion into my code being tested (in this case I had referenced == in my class's operator == method).

I found the problem by determining which unit tests were causing the problem (some of them would pass fine) and then debugging a faulty test and stepping through to the failure point.

Once I fixed the code, this problem went away.

Some of the test don't run and all a get is:

[26/10/2017 14:58:23 Informational] ------ Run test started ------
[26/10/2017 14:58:24 Informational] [xUnit.net 00:00:00.4103305]   Discovering: My.UnitTests
[26/10/2017 14:58:24 Informational] [xUnit.net 00:00:00.4930650]   Discovered:  My.UnitTests
[26/10/2017 14:58:24 Informational] [xUnit.net 00:00:00.4984163]   Starting:    My.UnitTests
[26/10/2017 14:59:40 Error] The active test run was aborted. Reason: 
[26/10/2017 14:59:40 Informational] ========== Run test finished: 29 run (0:01:16,8917963) ==========

Using VS 15.4.1 with:

<PackageReference Include="Microsoft.NET.Test.Sdk" Version="15.3.0" />
<PackageReference Include="xunit" Version="2.3.0" />
<PackageReference Include="xunit.runner.visualstudio" Version="2.3.0" />

EDIT:
I was able to find the culprit from the list of executed tests given by dotnet xunit -verbose

This is still an issue:

Microsoft (R) Test Execution Command Line Tool Version 15.5.0-preview-20171031-01
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
[xUnit.net 00:00:00.5355413]   Discovering: Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.FunctionalTests
[xUnit.net 00:00:01.1008471]   Discovered:  Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.FunctionalTests
[xUnit.net 00:00:01.1111929]   Starting:    Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.FunctionalTests
The active test run was aborted. Reason:

Test Run Aborted.

@pakrym Use --blame option to get culprit test . See https://github.com/Microsoft/vstest/issues/1069#issuecomment-350048544 for example.

Is there a reason that this isn't the default?

Yeah, It uses additional extensions(Logger and datacollector) to figure out order of tests execution and currently running test. Those will add overhead if it is added by default.
Edit: By default actual stack trace/error gets printed if available, But name of test & sequence of tests available only when --blame enabled.

I run up into this again but this time the issue was inside a finalizer, which means it was inside the GC. I don't know if it's even possible but it would be great if this kind of issue was reported by the test runner. Thanks.

I just spent a day figuring out why the test runner crashes. /Blame argument helped narrow this down to a specific test case. I wanted to investigate @smadala's claim that the extra information comes at expense of performance, so I ran tests with and without this parameter. They all took around 11 minutes, without any pattern that indicates that one way is faster than another.

Please set this flag as default, and possible add /Lightweight for these whose performance is allegedly affected

@AmadeusW Please create new issue for suggestions. Performance depends on how many tests were you running in 11 min and what is running time for each test.

Was this page helpful?
0 / 5 - 0 ratings