Roslyn: Roslyn should never report a non-fatal watson on the UI thread

Created on 6 Aug 2018  路  36Comments  路  Source: dotnet/roslyn

_From @davkean on August 3, 2018 7:9_

I raised this issue here: https://github.com/dotnet/roslyn/issues/20101, but the reporting part got lost. I'm running into UI delays due to Roslyn reporting watsons on the UI thread. If I look around the tree, other components, such as CPS (see ProjectFaultHandlerService.HandleErrorAsync), push the work onto a background thread.


Frame | IncCt | IncAvg
-- | -- | --
ROOT | 1 | 49,444
ntdll.dll!_RtlUserThreadStart | 1 | 49,444
ntdll.dll!__RtlUserThreadStart | 1 | 49,444
kernel32.dll!BaseThreadInitThunk | 1 | 49,444
devenv.exe!__scrt_common_main_seh | 1 | 49,444
devenv.exe!WinMain | 1 | 49,444
devenv.exe!CDevEnvAppId::Run | 1 | 49,444
devenv.exe!util_CallVsMain | 1 | 49,444
msenv.dll!VStudioMain | 1 | 49,444
msenv.dll!VStudioMainLogged | 1 | 49,444
msenv.dll!CMsoComponent::PushMsgLoop | 1 | 49,444
msenv.dll!SCM_MsoCompMgr::FPushMessageLoop | 1 | 49,444
msenv.dll!SCM::FPushMessageLoop | 1 | 49,444
msenv.dll!CMsoCMHandler::FPushMessageLoop | 1 | 49,444
msenv.dll!CMsoCMHandler::EnvironmentMsgLoop | 1 | 49,444
msenv.dll!MainMessageLoop::ProcessMessage | 1 | 49,444
user32.dll!DispatchMessageW | 1 | 49,444
user32.dll!DispatchMessageWorker | 1 | 49,444
user32.dll!UserCallWinProcCheckWow | 1 | 49,444
user32.dll!_InternalCallWinProc | 1 | 49,444
windowsbase.dll!MS.Win32.HwndSubclass.SubclassWndProc | 1 | 49,444
windowsbase.dll!System.Windows.Threading.Dispatcher.LegacyInvokeImpl | 1 | 49,444
windowsbase.dll!System.Windows.Threading.ExceptionWrapper.TryCatchWhen | 1 | 49,444
windowsbase.dll!System.Windows.Threading.ExceptionWrapper.InternalRealCall | 1 | 49,444
windowsbase.dll!MS.Win32.HwndSubclass.DispatcherCallbackOperation | 1 | 49,444
windowsbase.dll!MS.Win32.HwndWrapper.WndProc | 1 | 49,444
windowsbase.dll!System.Windows.Threading.Dispatcher.WndProcHook | 1 | 49,444
windowsbase.dll!System.Windows.Threading.Dispatcher.ProcessQueue | 1 | 49,444
windowsbase.dll!System.Windows.Threading.DispatcherOperation.Invoke | 1 | 49,444
windowsbase.dll!MS.Internal.CulturePreservingExecutionContext.Run | 1 | 49,444
mscorlib.dll!System.Threading.ExecutionContext.Run | 1 | 49,444
mscorlib.dll!System.Threading.ExecutionContext.RunInternal | 1 | 49,444
windowsbase.dll!System.Windows.Threading.DispatcherOperation.InvokeInSecurityContext | 1 | 49,444
windowsbase.dll!System.Windows.Threading.DispatcherOperation.InvokeImpl | 1 | 49,444
windowsbase.dll!System.Windows.Threading.ExceptionWrapper.TryCatchWhen | 1 | 49,444
microsoft.codeanalysis.editorfeatures.dll!Microsoft.CodeAnalysis.Editor.Shared.Utilities.SynchronizationContextTaskScheduler.PostCallback | 1 | 49,444
mscorlib.dll!System.Threading.Tasks.TaskScheduler.TryExecuteTask | 1 | 49,444
mscorlib.dll!System.Threading.Tasks.Task.ExecuteEntry | 1 | 49,444
mscorlib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal | 1 | 49,444
mscorlib.dll!System.Threading.ExecutionContext.Run | 1 | 49,444
mscorlib.dll!System.Threading.Tasks.Task.ExecutionContextCallback | 1 | 49,444
mscorlib.dll!System.Threading.Tasks.Task.Execute | 1 | 49,444
microsoft.visualstudio.languageservices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService+<>c__DisplayClass_0.<OnWorkspaceChanged>b__ | 1 | 49,444
microsoft.visualstudio.languageservices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessBatchedChangesOnForeground | 1 | 49,444
microsoft.visualstudio.languageservices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessProjectChange | 1 | 49,444
microsoft.visualstudio.languageservices.dll!Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.Report | 1 | 49,444
microsoft.visualstudio.telemetry.dll!Microsoft.VisualStudio.Telemetry.TelemetrySession.PostEvent | 1 | 49,444
microsoft.visualstudio.telemetry.dll!Microsoft.VisualStudio.Telemetry.SessionChannel.WatsonSessionChannel.PostEvent | 1 | 49,444
microsoft.visualstudio.telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WatsonReport.PostWatsonReport | 1 | 49,444
mscorlib.dll!System.Threading.WaitHandle.WaitOne | 1 | 49,444
mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne | 1 | 49,444
clr.dll!WaitHandleNative::CorWaitOneNative | 1 | 49,444
clr.dll!Thread::DoAppropriateWait | 1 | 49,444
clr.dll!Thread::DoAppropriateWaitWorker | 1 | 49,444
clr.dll!Thread::DoSyncContextWait | 1 | 49,444
clr.dll!MethodDescCallSite::CallTargetWorker | 1 | 49,444
clr.dll!CallDescrWorkerWithHandler | 1 | 49,444
clr.dll!CallDescrWorkerInternal | 1 | 49,444
microsoft.visualstudio.telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.NoPumpSyncContext.Wait | 1 | 49,444
kernelbase.dll!WaitForMultipleObjects | 1 | 49,444

_Copied from original issue: dotnet/project-system#3825_

Area-IDE Area-Performance Bug

All 36 comments

Should this be filed in Roslyn, not project-system? Also, I wonder if the change helps? Doesn't watson freeze the process while it collects the memory snapshot anyway?

@Pilchie do we have any info on what the actual watson is? This is in the code that is trying to figure out your nuget project config.

This may also be why @davkean is not getting "Add nuget reference" suggestions. So it would be great to know what's going on here.

Thanks!

@CyrusNajmabadi I think #20101 tracks the specific NFW.

if NFW decide to send a report, it will freeze the process, so which thread it is running shouldn't matter. if it decides not to send a report, it takes about no time, so it again, shouldn't matter whether it runs on UI thread or not.

I think it would be useful if we showed a dialog indicating what's going on, while the process is frozen.

tagging @ivanbasov for @tmat suggestion. @ivanbasov can we ask telemetry/NFW team to add such dialog if possible?

@heejaechang Before we rule out pushing this off the UI thread, I want to make sure that this isn't doing anymore work than just taking a snapshot. On my PC, I can create a dump in under a couple of seconds - why is it taking 50 seconds to take a dump? Are you sure this isn't blocking on other things, such as sending the dump?

it could be. let's move the conversation to email so that we can include people who actually own this API.

Can you please report the results of any conversations back here. This is important information to understand that affects development in the Roslyn codebase.

sure. will do.

Just hit this once again, VS hanging after creating a project, attach to see what's going on and we're reporting another NFW:

    ntdll.dll!778a031c()    Unknown
    [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] Unknown
    KernelBase.dll!75d52523()   Unknown
    KernelBase.dll!75d523d8()   Unknown
    [Managed to Native Transition]  
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.NoPumpSyncContext.Wait(System.IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout)  Unknown
    mscorlib.dll!System.Threading.SynchronizationContext.InvokeWaitMethodHelper(System.Threading.SynchronizationContext syncContext, System.IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout)    Unknown
    [Native to Managed Transition]  
    [Managed to Native Transition]  
    mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext)  Unknown
    mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) Unknown
    mscorlib.dll!System.Threading.WaitHandle.WaitOne()  Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WatsonReport.PostWatsonReport(int maxReportsPerSession) Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.SessionChannel.WatsonSessionChannel.PostEvent(Microsoft.VisualStudio.Telemetry.TelemetryEvent telemetryEvent) Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.TelemetrySession.PostEvent(Microsoft.VisualStudio.Telemetry.TelemetryEvent telemetryEvent)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.Report(string description, System.Exception exception, System.Func<Microsoft.VisualStudio.Telemetry.IFaultUtility, int> callback)  Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.Report(System.Exception exception) Unknown
    Microsoft.CodeAnalysis.Workspaces.dll!Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(System.Exception exception, System.Action<System.Exception> handler)  Unknown
>   Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessProjectChange(Microsoft.CodeAnalysis.Solution solution, Microsoft.CodeAnalysis.ProjectId projectId)    Unknown
    [Native to Managed Transition]  
    mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task task)   Unknown
    mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task)  Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTask.CompleteOnCurrentThread()    Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTask<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>.CompleteOnCurrentThread()    Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.Run<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>(System.Func<System.Threading.Tasks.Task<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>> asyncMethod, Microsoft.VisualStudio.Threading.JoinableTaskCreationOptions creationOptions)   Unknown
    NuGet.VisualStudio.Implementation.dll!NuGet.VisualStudio.VsPackageInstallerServices.GetInstalledPackages(EnvDTE.Project project)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.PackageServicesProxy.GetInstalledPackages(EnvDTE.Project project) Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessProjectChange(Microsoft.CodeAnalysis.Solution solution, Microsoft.CodeAnalysis.ProjectId projectId)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessBatchedChangesOnForeground(System.Threading.CancellationToken cancellationToken)   Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.OnWorkspaceChanged.AnonymousMethod__0(System.Threading.Tasks.Task _)  Unknown
    mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(object stateMachine)  Unknown
    mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)   Unknown
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)   Unknown
    mscorlib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()    Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.SingleExecuteProtector.TryExecute()   Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.SingleExecuteProtector..cctor.AnonymousMethod__20_0(object state) Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.SingleExecuteProtector.TryExecute()   Unknown
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.SingleExecuteProtector..cctor.AnonymousMethod__20_0(object state) Unknown

Can you please report the results of any conversations back here. This is important information to understand that affects development in the Roslyn codebase.

This was never done. Can the results of the previous conversation be brought here? Thanks!

I don't see NFW on a background thread as any better than a NFW on the UI thread. We should resolve these cases by eliminating the conditions that led to the report in the first place.

I don't see NFW on a background thread

It is. We appear to be waiting for it to send, the process was literally hung for over a minute. Other components are putting them on background threads, so should we or figure out how not to block the UI thread or put them on a background thread.

I ran into this again, and once again I thought VS had hung. I failed to get great symbols so I couldn't see what exception was being reported, but VS.Telemetry and LanguageServices were on the stack.

It would be useful if VS displayed a dialog showing that it's reporting an error.
@heejaechang have you asked the platform team about this?

Hit again today - once again thought I ran into a hang.

UI thread is blocked on:

    [Managed to Native Transition]  
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.NoPumpSyncContext.Wait(System.IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout)  Unknown
>   mscorlib.dll!System.Threading.SynchronizationContext.InvokeWaitMethodHelper(System.Threading.SynchronizationContext syncContext, System.IntPtr[] waitHandles, bool waitAll, int millisecondsTimeout) Line 349   C#
    [Native to Managed Transition]  
    [Managed to Native Transition]  
    mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext) Line 243 C#
    mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) Line 194    C#
    mscorlib.dll!System.Threading.WaitHandle.WaitOne() Line 215 C#
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WatsonReport.PostWatsonReport(int maxReportsPerSession) Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.SessionChannel.WatsonSessionChannel.PostEvent(Microsoft.VisualStudio.Telemetry.TelemetryEvent telemetryEvent) Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.TelemetrySession.PostEvent(Microsoft.VisualStudio.Telemetry.TelemetryEvent telemetryEvent)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.Report(string description, System.Exception exception, System.Func<Microsoft.VisualStudio.Telemetry.IFaultUtility, int> callback)  Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.CodeAnalysis.ErrorReporting.WatsonReporter.Report(System.Exception exception) Unknown
    Microsoft.CodeAnalysis.Workspaces.dll!Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(System.Exception exception, System.Action<System.Exception> handler)  Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessProjectChange(Microsoft.CodeAnalysis.Solution solution, Microsoft.CodeAnalysis.ProjectId projectId)    Unknown
    [Native to Managed Transition]  
    mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task task) Line 184  C#
    mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task) Line 157 C#
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTask.CompleteOnCurrentThread() Line 913   C#
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTask<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>.CompleteOnCurrentThread() Line 84    C#
    Microsoft.VisualStudio.Threading.dll!Microsoft.VisualStudio.Threading.JoinableTaskFactory.Run<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>(System.Func<System.Threading.Tasks.Task<System.Collections.Generic.List<NuGet.VisualStudio.IVsPackageMetadata>>> asyncMethod, Microsoft.VisualStudio.Threading.JoinableTaskCreationOptions creationOptions) Line 507  C#
    NuGet.VisualStudio.Implementation.dll!NuGet.VisualStudio.VsPackageInstallerServices.GetInstalledPackages(EnvDTE.Project project)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.PackageServicesProxy.GetInstalledPackages(EnvDTE.Project project) Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessProjectChange(Microsoft.CodeAnalysis.Solution solution, Microsoft.CodeAnalysis.ProjectId projectId)    Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.ProcessBatchedChangesOnForeground(System.Threading.CancellationToken cancellationToken)   Unknown
    Microsoft.VisualStudio.LanguageServices.dll!Microsoft.VisualStudio.LanguageServices.Packaging.PackageInstallerService.OnWorkspaceChanged.AnonymousMethod__0(System.Threading.Tasks.Task _)  Unknown

One other thread pool thread is on:

    [Managed to Native Transition]  
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.DefaultWerReportImplementation.WerReportSubmitEx(System.IntPtr hReportHandle, Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WER_CONSENT consent, int dwFlags)  Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WatsonReport.SendWatsonReport(bool fullCab, bool sendParams, out System.DateTime startTime, out System.DateTime endTime)    Unknown
    Microsoft.VisualStudio.Telemetry.dll!Microsoft.VisualStudio.Telemetry.WindowsErrorReporting.WatsonReport.PostWatsonReport.AnonymousMethod__0(object state)  Unknown
>   mscorlib.dll!System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(object state) Line 1274    C#
    mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 954  C#
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 902  C#
    mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() Line 1252    C#
    mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 820   C#
    mscorlib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() Line 1161   C#

This is a case of the UI blocked on an event that is set by this background thread:

C# ThreadPool.QueueUserWorkItem( (object state) => { try { submitResult = this.SendWatsonReport(fullCab, sendParams, out startTime, out endTime); } finally { mre.Set(); } }, null); mre.WaitOne();

If you look at SendWatsonReport it does a bunch of work, creating a watson report, reading files, adding dumps, etc.

This method absolutely should not be called on the UI thread. Other folks have pushed this work onto a background thread, so should Roslyn.

I run a query to figure out real world hits of this.

Over the past 9 days, it has had ~10,000 hits with 50th, 90th and 95th percentile times of 2.8, 17.1 and 41.2 seconds respectively. 3% of the delays are over 60 seconds, and 2% of the users that hit the delay kill devenv, presumably because they think VS has hung.

Top failures ordered by severity:

Blamed symbol|Hit count
---|---:
microsoft.visualstudio.languageservices.dll!microsoft.visualstudio.languageservices.packaging.packageinstallerservice+<>c__displayclass_0.b__|1470
vsdebug.dll!cvsbreakpointmanager::bindbreakpoints|2,301
vsdebug.dll!cvsdebugpackage::exec|896
microsoft.codeanalysis.typescript.editorfeatures.dll!microsoft.codeanalysis.editor.typescript.features.gotodefinition.typescriptgotosymbolservice+d__.movenext|746
vsdebug.dll!cdebugger::encmodechange|451
vsdebug.dll!cdebugger::stepover|384
msenv.dll!_maincanclose|823

If I look at data over the history of VS, this delay has 600,000 lifetime hits.

This has been mitigated with #32933 for now. We have plans to fix the underlying issue brought up here, as noted in the PR. #33187 tracks the dependency we need and changes required.

@ryzngard This issue ~is~ isn't tracking the particular instance of a non-fatal watson, but tracking the issue around reporting all-NFW on the UI thread in general.

I'm not advocating we close this issue; there's a lot of discussion around NFW on/off UI threads. Instead, we plan to move this to 16.1 with the context on why it's okay to do so (we've mitigated a large offender). #32751 has some discussion around this topic as well.

To sum up the best I can: There's debate over whether moving NFW to another thread is worthwhile since it will halt the full process to capture a dump anyways.

I think we should continue to investigate this and make meaningful improvements.

There's debate over whether moving NFW to another thread is worthwhile since it will halt the full process to capture a dump anyways.

That would be based on pure speculation, and not data. There's more to a capturing a dump then halting a process to take a memory dump - there's writing the file to disk (if required), network requests, etc. Even platform has acknowledged this: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/273636.

In CPS, we fault off the UI-thread and it does not show up in delays.

That would be based on pure speculation, and not data

Exactly, and we didn't have the time or resources for the 16.0 release.

Even platform has acknowledged this:

This is great information to have, thanks for the link. For CPS, how did you widespread move all NFW off the UI thread? Just wrap the API?

We have our own Watson usage (which is doing exactly what PostFault is doing under the covers). We force this to always occur on a thread-pool thread.

Note, it case it was missed, I pulled data above that showed that Roslyn's reporter is causing UI delays at a rate of 10,000 over a 9 day period, so even though you reduced one source of the delays (1470 hits of that), there's still a significant amount of NFW being reported on the UI thread.

PostFault is also the top UI delay for 16.0: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/766931.

This contains another version of this: https://developercommunity.visualstudio.com/content/problem/586146/vs-2017-15912-crashes-when-creating-or-loading-a-p.html. Instead this time the hang is trying to Exception.ToString which is attempting to download PDBs from a symbol server.

Bubbling up @heejaechang's comment from earlier (https://github.com/dotnet/roslyn/issues/29100#issuecomment-410803208):

if NFW decide to send a report, it will freeze the process, so which thread it is running shouldn't matter. if it decides not to send a report, it takes about no time, so it again, shouldn't matter whether it runs on UI thread or not.

What is the response to this? I agree that moving it off the UI thread won't help prevent UI delays.

@davkean said the following in a comment

There's more to a capturing a dump then halting a process to take a memory dump - there's writing the file to disk (if required), network requests, etc. Even platform has acknowledged this: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/273636.
In CPS, we fault off the UI-thread and it does not show up in delays.

Process freeze can't be helped, but other actions could halt UI thread if we're waiting on them.

My assumption is that's based on the call to WeReportSubmit which happens after WerReportAddDump based on WER documentation.

We're using WER_SUBMIT_OUTOFPROCESS which is documented as

Spawn another process to submit the report. The calling thread is blocked until the function returns.

We don't need to move off the UI thread if we consider using WER_SUBMIT_OUTOFPROCESS_ASYNC instead

Spawn another process to submit the report and return from this function call immediately. Note that the contents of the pSubmitResult parameter are undefined and there is no way to query when the reporting completes or the completion status.

AFAIK we don't care about the result of the submission. Moving off the UI thread isn't great because it definitely can and will lose context of the process. Halting the current thread to capture the dump is important. Halting the current thread to do the submission process is not.

I need to look back at the code to verify my assumptions though. I haven't looked in a few months.

@ryzngard Thanks for looking into this - I like your line of thinking.

I'm not sure what can be done about this on the Roslyn side without some VS help. Basically all we end up doing on our end is:

```c#
var faultEvent = new FaultEvent(
eventName: FunctionId.NonFatalWatson.GetEventName(),
description: "Roslyn NonFatal Watson",
FaultSeverity.Diagnostic,
exceptionObject: exception,
gatherEventDetails: faultUtility =>
{
// add current process dump
faultUtility.AddProcessDump(currentProcess.Id);

                // add ServiceHub log files:
                foreach (var path in CollectServiceHubLogFilePaths())
                {
                    faultUtility.AddFile(path);
                }

                // Returning "0" signals that we should send data to Watson; any other value will cancel the Watson report.
                return 0;
            });

        // add extra bucket parameters to bucket better in NFW
        // we do it here so that it gets bucketted better in both
        // watson and telemetry. 
        faultEvent.SetExtraParameters(exception, emptyCallstack);

        session.PostEvent(faultEvent);

We need to report this on the faulting thread synchronously so that we can get the dump captured for our process at the fault point.  The problem appears to be:

                foreach (var path in CollectServiceHubLogFilePaths())
                {
                    faultUtility.AddFile(path);
                }

```

however, the FaultEvent type has no way to say: here's the callback for information that must be collected synchronously, and here's the callback for information that should not be blocking.

Does anyone know who to contact on platform to discuss this further?

Steve Steiner, probably.

I've reached out to vs perf and reliability as well.

We changed our logic here to no longer capture a dump by default. We only capture if we've seen a lot of hits here and we explicitly ask watson for some dumps to investigate. So user impact should be greatly diminished. Closing out until we see poor behaviors in the future that need specific different resolutions.

Was this page helpful?
0 / 5 - 0 ratings