I'm trying to troubleshoot hanging builds on a CI server. I found this which seems very promising:
https://github.com/microsoft/vstest-docs/blob/master/RFCs/0028-BlameCollector-Hang-Detection.md
However, when I use the hang detector, I don't get a dump file.
The test hangs are intermittent, so they are hard to reproduce.
dotnet vstest is invoked with:
<lots of DLLs> --Parallel --logger:"trx;LogFileName=NUnitTestsCore.trx" --logger:"console;verbosity=minimal" --ResultsDirectory:.../build/test-reports --Settings:...\tmpCF7A.tmp
The settings file is auto generated and contains something like this:
<RunSettings>
<RunConfiguration>
<MaxCpuCount>4</MaxCpuCount>
</RunConfiguration>
<DataCollectionRunSettings>
<DataCollectors>
<DataCollector friendlyName="blame" enabled="True">
<Configuration>
<ResultsDirectory>...\build</ResultsDirectory>
<CollectDumpOnTestSessionHang TestTimeout="120000" DumpType="full"/>
</Configuration>
</DataCollector>
</DataCollectors>
</DataCollectionRunSettings>
</RunSettings>
I expect the hang detector to detect a hang and produce a crash dump file.
The hang detector did detect a hang after ~2 minutes:
The active test run was aborted. Reason: Test host process crashed
...
Test Run Aborted.
Attachments:
...\build\test-reports\4a680b77-23cd-471a-9b82-ead6630865fa\Sequence_af08f6cfd55f4dd5989add68f10ea91f.xml
However, it only produces a sequence file, not a crash dump.
Note that the sequence file ends up in the result directory used on the command line, rather than the results directory in the settings file.
None produced by the above command.
Windows Server 2012
.NET Core version 3.0.100
I should add that the sequence file was very helpful, but I was still surprised not to see a crash dump.
After quick analysis I found quite a few problems, most of them would be mitigated by logging errors to the client correctly:
StartHangBasedProcessDump in ProcessDumpUtilitySystem.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Users\jajares\source\repos\UnitTestProject60\blame\Sequence_94c02868a34143d9866f6cd1392e90cf.xml'.
TpTrace Verbose: 0 : 22140, 4, 2020/03/27, 09:32:50.999, 2575613428195, vstest.console.dll, DataCollectionRequestSender.SendAfterTestRunStartAndGetResult : Received message: (DataCollection.AfterTestRunEndResult) -> [
{
"Uri": "datacollector://Microsoft/TestPlatform/Extensions/Blame/v1",
"DisplayName": "Blame",
"Attachments": [
{
"Description": "",
"Uri": "file://C:/Users/jajares/source/repos/UnitTestProject60/TestResults/ddf1714c-a2b3-499c-90c2-a5c57993e28c/Sequence_d40ebf81ba11467e95dd761bd9e0b5a7.xml"
}
]
}
]
TpTrace Error: 0 : 34124, 10, 2020/03/27, 09:22:48.201, 2569585445775, datacollector.dll, BlameCollector.CollectDumpAndAbortTesthost: Failed with error Microsoft.VisualStudio.TestPlatform.ObjectModel.TestPlatformException: Required environment variable PROCDUMP_PATH was null or empty. Set PROCDUMP_PATH to path of folder containing appropriate procdump executable.
at Microsoft.TestPlatform.Extensions.BlameDataCollector.ProcessDumpUtility.GetProcDumpExecutable(Int32 processId)
at Microsoft.TestPlatform.Extensions.BlameDataCollector.ProcessDumpUtility.StartHangBasedProcessDump(Int32 processId, String dumpFileGuid, String testResultsDirectory, Boolean isFullDump)
at Microsoft.TestPlatform.Extensions.BlameDataCollector.BlameCollector.CollectDumpAndAbortTesthost()
... in your config just you shortening the path? I did not know that you can create ... folder in windows and it will be recursive on it's parent, thanks for breaking my filesystem 馃榿 I am afraid we won't be able to get to this anytime soon, but it should be reasonably simple to fix, please consider PRing this 馃檪
<RunSettings>
<RunConfiguration>
<MaxCpuCount>4</MaxCpuCount>
</RunConfiguration>
<DataCollectionRunSettings>
<DataCollectors>
<DataCollector friendlyName="blame" enabled="True">
<Configuration>
<ResultsDirectory>blame</ResultsDirectory>
<CollectDumpOnTestSessionHang TestTimeout="5000" DumpType="full"/>
</Configuration>
</DataCollector>
</DataCollectors>
</DataCollectionRunSettings>
</RunSettings>
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace UnitTestProject60
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
System.Threading.Thread.Sleep(100_000);
}
}
}
I am using dotnet test --settings .\settings.runsettings --diag:c:\temp\log.txt and DebugView++ to easily see the logs from both vstestconsole, datacollector and testhost. The path to the log does not matter, it's just the simplest way to enable verbose logging for all components.
Sorry, ... is just where I replaced irrelevant portions of the paths. :-)
I didn't have PROCDUMP_PATH set, so that's likely the issue. I can't believe I missed that in the documentation.
I'll see if I can create a PR for the problems you found.
@provegard please do. You can tag me to help out with the review.
It was a pet project of mine but I never got round to polishing it, will help in any way I can.
@ShreyasRmsft I have trouble building the repo code using build.cmd. I get:
Failed to find VS installation with requirements: Microsoft.Component.MSBuild Microsoft.Net.Component.4.6.TargetingPack Microsoft.VisualStudio.Component.VSSDK
I follow these contribution guidelines
I have .NET 4.6.1 and 4.7. I cannot install 4.6.2 because it says there's a newer one installed already (4.7, obviously). The doc says "4.6.2 or higher", so that should be fine.
I have installed the two MSIs for the 4.6 targeting pack.
Any ideas?
@provegard do you have VS2017 or VS2019? I remember someone else facing issues with 2019, the build scripts are slightly outdated.
If you are on VS2019, try getting VS2017.
But before that also try setting up 4.6.2 developer pack from https://dotnet.microsoft.com/download/dotnet-framework/net462.
Also one other thing to try is to install the 4.6.2 sdk from visual studio installer itself and maybe remove the MSIs.
Problem solved, the part about 4.6 targeting pack was a red herring. Installing VS extension development support in VS2017 did it.
The unit tests pass. Smoke and platform tests fail (the platform tests actually hang).
Also, when I open the solution in VS2017, I get:
The current .NET SDK does not support targeting .NET Standard 2.0. Either target .NET Standard 1.6 or lower, or use a version of the .NET SDK that supports .NET Standard 2.0.
I have .NET 4.6.1, 4.6.2, 4.7, 4.7.2 as well as long range of .NET Core SDKs:
2.1.401 [C:\Program Files\dotnet\sdk]
2.1.402 [C:\Program Files\dotnet\sdk]
2.1.500 [C:\Program Files\dotnet\sdk]
2.1.506 [C:\Program Files\dotnet\sdk]
2.1.508 [C:\Program Files\dotnet\sdk]
2.1.509 [C:\Program Files\dotnet\sdk]
2.1.512 [C:\Program Files\dotnet\sdk]
2.1.801 [C:\Program Files\dotnet\sdk]
2.1.802 [C:\Program Files\dotnet\sdk]
2.2.106 [C:\Program Files\dotnet\sdk]
2.2.108 [C:\Program Files\dotnet\sdk]
2.2.110 [C:\Program Files\dotnet\sdk]
2.2.401 [C:\Program Files\dotnet\sdk]
2.2.402 [C:\Program Files\dotnet\sdk]
3.0.100 [C:\Program Files\dotnet\sdk]
3.1.100 [C:\Program Files\dotnet\sdk]
3.1.201 [C:\Program Files\dotnet\sdk]
Any ideas?
Example of smoke test failure:
X RunAllTestExecution [1s 531ms]
Error Message:
Assert.IsTrue failed. Test SampleUnitTestProject.UnitTest1.PassingTest does not appear in passed tests list.
Stack Trace:
at Microsoft.TestPlatform.TestUtilities.IntegrationTestBase.ValidatePassedTests(String[] passedTests) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.TestUtilities\IntegrationTestBase.cs:line 265
at Microsoft.TestPlatform.SmokeTests.ExecutionTests.RunAllTestExecution() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.SmokeTests\ExecutionTests.cs:line 18
Another:
... .. . Failed tests:
... .. .. . 1. Microsoft.TestPlatform.SmokeTests.DiscoveryTests.DiscoverAllTests
... .. .. .. .ErrorMessage:
Assert.IsTrue failed. Test SampleUnitTestProject.UnitTest1.PassingTest does not appear in discovered tests list.
Std Output:
Std Error: Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.VisualStudio.CodeCoverage.Shim, Version=15.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified. at Microsoft.VisualStudio.TestPlatform.CommandLine.Program.Main(String[] args)
... .. .. .. .StackTrace:
at Microsoft.TestPlatform.TestUtilities.IntegrationTestBase.ValidateDiscoveredTests(String[] discoveredTestsList) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.TestUtilities\IntegrationTestBase.cs:line 315
at Microsoft.TestPlatform.SmokeTests.DiscoveryTests.DiscoverAllTests() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.SmokeTests\DiscoveryTests.cs:line 17
Are they environment-specific? I'm on Windows as noted before.
Platform test failures:
X InitializeShouldSubscribeToDataCollectionEvents [79ms]
Error Message:
Test method Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.EventLogDataCollectorTests.InitializeShouldSubscribeToDataCollectionEvents threw exception:
System.NullReferenceException: Object reference not set to an instance of an object.
Stack Trace:
at Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.TestableDataCollectionEvents.GetTestHostLaunchedInvocationList() in C:\kod\projects\vstest\test\DataCollectors\Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests\EventLogDataCollectorTests.cs:line 456
at Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests.EventLogDataCollectorTests.InitializeShouldSubscribeToDataCollectionEvents() in C:\kod\projects\vstest\test\DataCollectors\Microsoft.TestPlatform.Extensions.EventLogCollector.UnitTests\EventLogDataCollectorTests.cs:line 221
And this one is fun:
X TestCaseSerialize [472ms]
Error Message:
Test method Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.TestCaseSerialize threw exception:
System.IO.DirectoryNotFoundException: Could not find a part of the path 'E:\ProtocolPerf.txt'.
Stack Trace:
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.StreamWriter.CreateFile(String path, Boolean append, Boolean checkHost)
at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding, Int32 bufferSize, Boolean checkHost)
at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding)
at System.IO.File.InternalAppendAllText(String path, String contents, Encoding encoding)
at System.IO.File.AppendAllText(String path, String contents)
at Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.VerifyPerformanceResult(String scenario, Int64 expectedElapsedTime, Int64 elapsedTime) in C:\kod\projects\vstest\test\Microsoft.TestPlatform.PerformanceTests\ProtocolV1Tests.cs:line 121
at Microsoft.TestPlatform.PerformanceTests.ProtocolV1Tests.TestCaseSerialize() in C:\kod\projects\vstest\test\Microsoft.TestPlatform.PerformanceTests\ProtocolV1Tests.cs:line 58
I need to mount another drive letter it seems. :-)
I'm not trying to be annoying here, just wondering if you (maintainers/collaborators) are seeing these test errors as well?
I think you need VS enterprise, some of these dlls are only shipped on the enterprise version like "Microsoft.VisualStudio.CodeCoverage.Shim".
Hehe these are pretty outdated, go ahead with only the UTs locally. The acceptance and smoke tests will get validated on the CI. Plus I don't think blame data collector has any E2E tests.
Still having bad luck with VS. Lots of errors. It seems I need to have .NET 4.5.1 installed as well to get things to compile. I feel that my disk is slowly filling up with all possible .NET versions. :-D
@nohwnd did you end up facing any of these initial setup issues when you first cloned the repo? I never encountered these because my VS was already pretty much loaded with all the possible skus and extensions.
@provegard sorry, been busy with release. Hope you did not give up. I am installing VS Community to see if I can build. I am pretty sure I was able to do it before.
We are all actually using 2019, sorry. I am sure you need at least these workloads. The Visual Studio Extension development should be optional if you skip the vsix generating step in the script, see below.

And then from the individual components you'd need the Portable Pack and .NET 4.5.1.

I almost never run all acceptance tests locally. You should be good to go with just unit tests or at best smoke tests.
I did see the same issues (and more) when joining this project. And never got to go back and update the installation guide. Sorry about that. I will changing our release pipeline a lot, and imho you don't need to build the vsix locally in most cases. You can comment out these steps in the build.ps1 and it should still build. If you need more help ping me on twitter or here, I can spend 15 minutes showing you stuff. :)

Thanks, I'll give it a try!
Related #2380
@provegard , any update on this issue?
Actually there is a lot. In the latest net5.0 release (I think since preview6). We are leveraging the Diagnostics NetCore client to create hang dumps. This works on Windows (with any target framework) and Linux (with netcoreapp3.1 and newer). There is no need for procdump.exe when creating hang dumps, or for the temporary folder.
To trigger a hang dump you can now simply do: dotnet test --blame-hang-timeout 2min or vstest.console /Blame:"CollectHangDump;TestTimeout=2min".
For crash dumps the situation is similar as before, but it errors out a bit better. There you still need procdump, because that flow needs to attach to a running process and detect failure, which is no easy task. But luckily crash dumps are usually way less interesting than hang dumps, because when the process crashes it often has an eay to see reason.
From dotnet test help:
--blame Runs the tests in blame mode. This option is helpful in isolating problematic tests that cause the test host to crash or hang.
When a crash is detected, it creates an sequence file in TestResults/guid/guid_Sequence.xml that captures the order of tests that were run before the crash.
Based on the additional settings, hang dump or crash dump can also be collected.
Example:
Timeout the test run when test takes more than the default timeout of 1 hour, and collect crash dump when the test host exits unexpectedly.
(Crash dumps require additional setup, see below.)
dotnet test --blame-hang --blame-crash
Example:
Timeout the test run when a test takes more than 20 minutes and collect hang dump.
dotnet test --blame-hang-timeout 20min
--blame-crash Runs the tests in blame mode and enables collecting crash dump when testhost exits unexpectedly.
This option is currently only supported on Windows, and requires procdump.exe and procdump64.exe to be available in PATH.
Or PROCDUMP_PATH environment variable to be set, and point to a directory that contains procdump.exe and procdump64.exe.
The tools can be downloaded here: https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
Implies --blame.
--blame-crash-dump-type <DUMP_TYPE> The type of crash dump to be collected. Implies --blame-crash.
--blame-crash-collect-always Enables collecting crash dump on expected as well as unexpected testhost exit.
--blame-hang Run the tests in blame mode and enables collecting hang dump when test exceeds the given timeout. Implies --blame-hang.
--blame-hang-dump-type <DUMP_TYPE> The type of crash dump to be collected. When None, is used then test host is terminated on timeout, but no dump is collected. Implies --blame-hang.
--blame-hang-timeout <TIMESPAN> Per-test timeout, after which hang dump is triggered and the testhost process is terminated.
The timeout value is specified in the following format: 1.5h / 90m / 5400s / 5400000ms. When no unit is used (e.g. 5400000), the value is assumed to be in milliseconds.
When used together with data driven tests, the timeout behavior depends on the test adapter used. For xUnit and NUnit the timeout is renewed after every test case,
For MSTest, the timeout is used for all testcases.
This option is currently supported only on Windows together with netcoreapp2.1 and newer. And on Linux with netcoreapp3.1 and newer. OSX and UWP are not supported.
@hvinett sorry, this hasn't been a priority for me and given all the problems I had with the setup, I put it aside.
But judging from @nohwnd's answer, anything I could do would be pointless anyway. :)
Most helpful comment
Problem solved, the part about 4.6 targeting pack was a red herring. Installing VS extension development support in VS2017 did it.