OSX.1013.Amd64.Open-x64:Debug
https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F38953~2Fmerge/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190626.70/workItem/System.IO.FileSystem.Watcher.Tests/wilogs
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at Interop.RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(Object args) in /_/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 247
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs:line 201
Fatal error. Internal CLR error. (0x80131506)
at Interop+RunLoop.CFRunLoopRun()
at Interop+RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher+RunningInstance+StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
cc: @JeremyKuhne, @carlossanlop
cc: @MaximLipnin
Maybe related to https://github.com/dotnet/corefx/pull/33865
@stephentoub do you know what could cause a NRE in a native call into macOS? Is the interop layer translating some kind of errno returned from the OS here?
Then I see we get a COR_E_EXECUTIONENGINE while trying to deal with the exception.
Not sure; @jkoritzinsky, @AaronRobinsonMSFT, ideas?
@danmosemsft and @stephentoub The COR_E_EXECUTIONENGINE can be caused by an invalid P/Invoke stub or invalid translation as well. I did recently fix a few issues in macOS regarding race conditions in shutdown - https://github.com/dotnet/coreclr/pull/24847. How often is this happening? https://github.com/dotnet/corefx/pull/33865 does look promising too. Can someone point to the P/Invoke that is causing this issue?
@AaronRobinsonMSFT, I only saw it that once. This is the P/Invoke:
https://github.com/dotnet/corefx/blob/2aa2518a47c541148e2dc3f9d040942e44395acd/src/Common/src/Interop/OSX/Interop.RunLoop.cs#L33-L35
where CoreFoundationLibrary is:
https://github.com/dotnet/corefx/blob/2aa2518a47c541148e2dc3f9d040942e44395acd/src/Common/src/Interop/OSX/Interop.Libraries.cs#L9
For future reference, it seems console output cannot directly be queried in Kusto. The best we can easily do is look for 134
WorkItems
| where Started > now(-10d)
| where FriendlyName == "System.IO.FileSystem.Watcher.Tests"
| where ExitCode == 134
| join kind= inner (
Jobs | where Started > now(-10d) | project QueueName , JobId, Build, Type, Source
) on JobId
| summarize count() by ExitCode, QueueName
| order by count_ desc
ExitCode QueueName count_
134 Ubuntu.1604.Arm32.Open 28
134 Ubuntu.1604.Arm64.Docker.Open 25
134 Ubuntu.1804.Amd64.Open 6
134 OSX.1013.Amd64.Open 1
```
compared to ~5000 with 0 exit code.
@danmosemsft Do we know what version of coreclr is being used to run the tests themselves? Is it possible the coreclr being used in this instance is older than the date at which https://github.com/dotnet/coreclr/pull/24847 was checked in?
We'd have to dig to see which exact version, but this was 5 days ago, and corefx ingested a new coreclr 5 days ago, 6 days ago, and 7 days ago, so it definitely has that change from end of May.
@mjanecke can you tell from the output at https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F38953~2Fmerge/test~2Ffunctional~2Fcli~2Finnerloop~2F/20190626.70/workItem/System.IO.FileSystem.Watcher.Tests/wilogs whether a dump got uploaded? I don't see the usual dumpling song and dance.
@danmosemsft, for that test, it looks like the failure happened on a Mac, which I don't think @epananth has finished implementing for dump collection yet.
@mjanecke good point, and last I heard we did not have a way to create a dump on Mac (that could be debugged with managed debuggers). @mikem8361 any update on that? Is our only option to catch this under an attached debugger?
(It's quite possible this isn't Mac specific actually but I'm interested anyway)
The best way would be to catch this under a managed debugger. There is no “createdump” facility on macOS. You can setup your macOS to generate system core dumps (ulimit -c unlimited, etc.) but there a limitation in lldb that makes SOS on coredumps on macOS difficult. See https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md#launch-lldb-under-macos.
mikem
Since this is clearly mac specific we can confidently say it has only happened 1 time in 10 days.
No more hits in the last week. Query:
WorkItems
| where Started > now(-10d)
| where FriendlyName == "System.IO.FileSystem.Watcher.Tests"
| where ExitCode == 134
| join kind= inner (
Jobs | where Started > now(-10d) | project QueueName , JobId, Build, Type, Source
) on JobId
| summarize count() by ExitCode, QueueName
| order by count_ desc
| | |
-- | -- | -- | --
| ExitCode | QueueName | count_
| 134 | Ubuntu.1604.Arm32.Open | 3
Seems to be infrastructure related. Moving to 5.0.
Failed in dotnet/corefx#42512:
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/System.Private.CoreLib/shared/System/Runtime/InteropServices/SafeHandle.cs:line 156
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1744
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 215
at System.IO.FileSystemWatcher.RunningInstance.CancellationCallback() in /_/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 276
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 370
at Interop.RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(Object args) in /_/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 249
at System.Threading.ThreadHelper.ThreadStart_Context(Object state) in /_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 50
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs:line 201
Fatal error. Internal CLR error. (0x80131506)
at Interop+RunLoop.CFRunLoopRun()
at Interop+RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher+RunningInstance+StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(System.Object)
at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.ThreadHelper.ThreadStart(System.Object)
@wfurt
I will take a look. We still have few flaky watcher tests on OSX and my plan was to run whole set in loop and chase them down.
Failed again in https://github.com/dotnet/runtime/pull/136. Same stack. Here's a dump: https://helix.dot.net/api/2019-06-17/jobs/b40ed2a0-5a9f-4d81-9bb5-62900426af75/workitems/System.IO.FileSystem.Watcher.Tests/files/core.18074 (it's 3.4 GB! get it while its hot / still available)
The core file is incomplete or corrupted:
(lldb) target create --core "/Users/furt/Downloads/core.18074"
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3055 LC_SEGMENT_64 has a fileoff + filesize (0xda14f000) that extends beyond the end of the file (0xda14e000), the segment will be truncated to match
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3056 LC_SEGMENT_64 has a fileoff (0xda14f000) that extends beyond the end of the file (0xda14e000), ignoring this section
Core file '/Users/furt/Downloads/core.18074' (x86_64) was loaded.
(lldb) plugin load /Users/furt/github/diagnostics/artifacts/bin/OSX.x64.Debug/libsosplugin.dylib
(lldb) clrstack
Failed to load data access module, 0x80004005
Can not load or initialize libmscordaccore.dylib. The target runtime may not be initialized.
ClrStack failed
We should never limit core size as that makes them unusable. Also it is full of zeros and compressed size is _only_ ~ 400M. We should not dump them raw to storage if we care about size. Should we open new issue to make core dumps on OSX usable @ericstj ? And if so I'm not what is correct repo and team to work on this.
cc: @epananth for core file part.
I'm guessing https://github.com/dotnet/arcade and tag @mjanecke @alexperovich @ChadNedzlek
The clrstack error message means that SOS can’t find the right version of the DAC. Try enabling download support with “setsymbolserver -ms” and SOS will attempt to download the correct version. There still may be problems with the dump and there still is the swift lldb bug with coredump thread ids (see https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md#launch-lldb-under-macos) for more information.
mikem
From: Tomas Weinfurt notifications@github.com
Sent: Wednesday, November 20, 2019 11:57 AM
To: dotnet/corefx corefx@noreply.github.com
Cc: Mike McLaughlin mikem@microsoft.com; Mention mention@noreply.github.com
Subject: Re: [dotnet/corefx] Unhandled NullReferenceException on macOS from FileSystemWatcher (#38966)
The core file is incomplete or corrupted:
(lldb) target create --core "/Users/furt/Downloads/core.18074"
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3055 LC_SEGMENT_64 has a fileoff + filesize (0xda14f000) that extends beyond the end of the file (0xda14e000), the segment will be truncated to match
warning: (x86_64) /Users/furt/Downloads/core.18074 load command 3056 LC_SEGMENT_64 has a fileoff (0xda14f000) that extends beyond the end of the file (0xda14e000), ignoring this section
Core file '/Users/furt/Downloads/core.18074' (x86_64) was loaded.
(lldb) plugin load /Users/furt/github/diagnostics/artifacts/bin/OSX.x64.Debug/libsosplugin.dylib
(lldb) clrstack
Failed to load data access module, 0x80004005
Can not load or initialize libmscordaccore.dylib. The target runtime may not be initialized.
ClrStack failed
We should never limit core size as that makes them unusable. Also it is full of zeros and compressed size is only ~ 400M. We should not dump them raw to storage if we care about size. Should we open new issue to make core dumps on OSX usable @ericstjhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fericstj&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046674512&sdata=IcmY6MYuoioMkZ2AV23AjSKwGhC7xilyYVvIAMROb2k%3D&reserved=0 ? And if so I'm not what is correct repo and team to work on this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F38966%3Femail_source%3Dnotifications%26email_token%3DACPYK22FC5YXARV7BNZRDN3QUWI7VA5CNFSM4H3WOBPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEULIOQ%23issuecomment-556315706&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046684509&sdata=ZB6bXCVnZMW9GNoWvxZefDG%2FlImrIvfo%2FBb9skLB8U0%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACPYK26PPOSJ4WJ7H2KS4HDQUWI7VANCNFSM4H3WOBPA&data=02%7C01%7Cmikem%40microsoft.com%7C1e551ace51504e4a546e08d76df3c47a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637098766046684509&sdata=qJsX4rmbe0WtXxtMqU4e6MVRpZ%2BWHZCBQz%2FR27SXRHs%3D&reserved=0.
@jkotas, hi, I met the same issue as you posted on macOS (on Windows it works fine..). Is it being fixed? Thanks~
no. If you have repro, please share it @xieofxie .
@wfurt It is generated by a validation in azure devops for the pr above.. I don't know how to reproduce it steadily..
Hit by https://github.com/dotnet/runtime/pull/2107. From https://helix.dot.net/api/2019-06-17/jobs/003fe36f-508e-40d9-9a7e-f35e3a3aea65/workitems/System.IO.FileSystem.Watcher.Tests/files/console.ce2942ec.log :
/tmp/helix/working/AC3109A7/p/dotnet exec --runtimeconfig System.IO.FileSystem.Watcher.Tests.runtimeconfig.json --depsfile System.IO.FileSystem.Watcher.Tests.deps.json xunit.console.dll System.IO.FileSystem.Watcher.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -notrait category=nonnetcoreapptests -notrait category=nonosxtests
popd
===========================================================================================================
/private/tmp/helix/working/AC3109A7/w/A3D808CF/e /private/tmp/helix/working/AC3109A7/w/A3D808CF/e
Discovering: System.IO.FileSystem.Watcher.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.FileSystem.Watcher.Tests (found 151 of 184 test cases)
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs:line 149
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/coreclr/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1751
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 220
at System.IO.FileSystemWatcher.RunningInstance.CancellationCallback() in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 278
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 370
at Interop.RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(ManualResetEventSlim runLoopStarted, SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 251
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.<>c.<ScheduleEventStream>b__3_0(Object args) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 204
at System.Threading.ThreadHelper.ThreadStart_Context(Object state) in /_/src/coreclr/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 47
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 201
Fatal error. Internal CLR error. (0x80131506 (COR_E_EXECUTIONENGINE))
@jkotas just curious as an aside, is it expected that the CLR should exit with Internal CLR error. (0x80131506 (COR_E_EXECUTIONENGINE)) when user code has simply thrown an exception on a background thread?
That is not expected either. It may be either side-effect of the original problem; or a different bug.
It seems like we did not collect core file despite 'Core dumped' message.
cc: @epananth @MattGal in case I'm mistaken.
It seems like we did not collect core file despite 'Core dumped' message.
cc: @epananth @MattGal in case I'm mistaken.
From the log for the above work item, there wasn't a file found on the cores folder when the work item finished.
2020-02-24T18:57:27.719Z INFO io(27) copy_tree_to b"Copying '/tmp/helix/working/AC3109A7/w/A3D808CF/u/zh-Hant/Microsoft.VisualStudio.TestPlatform.ObjectModel.resources.dll' to '/tmp/helix/working/AC3109A7/w/A3D808CF/e/zh-Hant/Microsoft.VisualStudio.TestPlatform.ObjectModel.resources.dll'"
2020-02-24T18:57:27.720Z INFO executor(109) run Successfully downloaded work item payload
2020-02-24T18:57:27.720Z INFO executor(631) _execute_command running $HELIX_CORRELATION_PAYLOAD/scripts/27f91e9998934da69568877a9a85d01a/execute.sh in /tmp/helix/working/AC3109A7/w/A3D808CF/e max 900 seconds
2020-02-24T18:57:38.713Z INFO executor(608) _dump_file_upload No dump files to upload
2020-02-24T18:57:38.715Z INFO executor(721) _upload_files File ready to be upload : console.ce2942ec.log
2020-02-24T18:57:38.715Z INFO executor(690) _upload_single_file Uploading the file console.ce2942ec.log
We can investigate the specific machine in question (dci-mac-build-012) but if dumps work on it for unhandled exceptions we have to consider the other possibilities.
I spent some time checking this out with @epananth and it seems the helix-runner user just didn't have permissions to the /cores folder on that machine. We're checking it out now.
@Wfurt After updating the permission, we are able to create core dumps in the cores folder. 10.13 did not come with /cores directory so we had to manually add it and give the right permissions to it. Unfortunately this did not have the right permission,.I think that is the reason this failed.
Thanks for the update @MattGal & @epananth. I'm positive there will be crashes in future :)
Also spoke to @ilyas1974 he said that this particular machine dci-mac-build-012, was recently updated, so chances are one might have missed some set up scripts. But we do make that change for all 10.13 machines.
I think you are confusing me with someone else, sorry i don't host mac-build machines.
On 25 Feb 2020, at 05:57, Epsitha Ananth notifications@github.com wrote:
Also spoke to @ilya-bin he said that this particular machine dci-mac-build-012, was recently updated, so chances are one might have missed some set up scripts. But we do make that change for all 10.13 machines.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Should be @ilyas1974 sorry @ilya-bin
Haven't seen this in last 100 builds. Removing "blocking-clean-ci". If this comes up again we should consider disabling until we get to the bottom to help get PRs / CI off the floor.
Just happened again: https://dev.azure.com/dnceng/public/_build/results?buildId=562826&view=logs
/private/tmp/helix/working/B7480941/w/A8CA0943/e /private/tmp/helix/working/B7480941/w/A8CA0943/e
Discovering: System.IO.FileSystem.Watcher.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.FileSystem.Watcher.Tests (found 151 of 184 test cases)
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs:line 149
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/coreclr/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1747
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 220
at System.IO.FileSystemWatcher.RunningInstance.CancellationCallback() in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 278
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 370
at Interop.RunLoop.CFRunLoopRun()
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(ManualResetEventSlim runLoopStarted, SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 251
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.<>c.<ScheduleEventStream>b__3_0(Object args) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 204
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 201
Fatal error. Internal CLR error. (0x80131506 (COR_E_EXECUTIONENGINE))
Repeat 2 times:
--------------------------------
at Interop+RunLoop.CFRunLoopRun()
--------------------------------
at System.IO.FileSystemWatcher+RunningInstance+StaticWatcherRunLoopManager.WatchForFileSystemEventsThreadStart(System.Threading.ManualResetEventSlim, Microsoft.Win32.SafeHandles.SafeEventStreamHandle)
at System.IO.FileSystemWatcher+RunningInstance+StaticWatcherRunLoopManager+<>c.<ScheduleEventStream>b__3_0(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
This time there is a core dump: https://helix.dot.net/api/2019-06-17/jobs/b62f6bde-4acb-47a1-8375-a2905024d639/workitems/System.IO.FileSystem.Watcher.Tests/files/core.34665
@wfurt seems you were looking for a dump here?
I did. I can see handle closed but that does not tell me why. We can put in some bandaide but it would be nice to figure out what is really going on. I was not able to reproduce but I put in only small effort.
Happened again in https://dev.azure.com/dnceng/public/_build/results?buildId=589913&view=ms.vss-test-web.build-test-results-tab&runId=18593304&resultId=172079&paneView=attachments
Configuration: netcoreapp5.0-OSX-Debug-x64-CoreCLR_checked-OSX.1013.Amd64.Open
Discovering: System.IO.FileSystem.Watcher.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.FileSystem.Watcher.Tests (found 151 of 184 test cases)
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs:line 149
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/coreclr/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1758
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 220
at System.IO.FileSystemWatcher.RunningInstance.CancellationCallback() in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 278
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 370
console.352e39f2.log:
https://helix.dot.net/api/2019-06-17/jobs/55d612f0-35cc-408f-b401-7a7e0a381703/workitems/System.IO.FileSystem.Watcher.Tests/files/console.352e39f2.log
FWIW @ViktorHofer PR is the only time I see this failure in the last 100 builds. So it does look fairly infrequent but also seems like we have a fix in the works here.
I got this same error on my PR today - re-opening the issue to track the CI failure.
/private/tmp/helix/working/A517094C/w/B80409AD/e /private/tmp/helix/working/A517094C/w/B80409AD/e
Discovering: System.IO.FileSystem.Watcher.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.FileSystem.Watcher.Tests (found 150 of 183 test cases)
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs:line 149
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/coreclr/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1758
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 226
at System.IO.FileSystemWatcher.RunningInstance.CleanupEventStream() in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 282
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 358
./RunTests.sh: line 161: 69523 Abort trap: 6 (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.FileSystem.Watcher.Tests.runtimeconfig.json --depsfile System.IO.FileSystem.Watcher.Tests.deps.json xunit.console.dll System.IO.FileSystem.Watcher.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/private/tmp/helix/working/A517094C/w/B80409AD/e
----- end Tue May 19 13:26:13 PDT 2020 ----- exit code 134 ----------------------------------------------------------
@jaredpar, do you know if we have seen this at all in recent history other than in @eerhardt's example?
@alnikola may have the answer as well. We can probably just count all unfinished test runs since I'm not awake of any other issue causing crashes.
From looking at the code, I have a hypothesis for how this is happening. Different reason from the issue I previously fixed, but same symptom.
When the FileSystemWatcher is started on OSX, it creates a new instance of RunningInstance and Starts it:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L52-L53
The Start method creates this event stream SafeHandle and stores it into a field on the instance:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L311
But... it appears nothing is rooting this RunningInstance instance. That _eventStream is passed to ScheduleEventStream:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L324
but if the event loop is already running, servicing another FileSystemWatcher, then nothing will root _eventStream, either:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L196-L202
So, we could end up in a situation where the FileSystemWatcher gets dropped, and with nothing rooting the SafeHandle, it gets finalized. But then the FileSystemEventCallback associated with this RunningInstance is invoked (which is dubious if nothing is keeping alive the delegate that was handed out to the native function), it can't find the watcher in its weak reference, and it proceeeds to call CleanupEventStream:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L356-L360
which will in turn try to use the _eventStream:
https://github.com/dotnet/runtime/blob/8f929a123fcd9b9c8d7841a26ff87e4843c8ffa5/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs#L269-L286
It doesn't matter that my previous fix successfully ensures that Close can be called on the SafeHandle only by the thread that successfully swapped it out of _eventStream, because the race is actually with finalization. Hooray for resurrection :(
That's my working theory, at least. The fix would be to root things appropriately with a GCHandle, and ensure as well that something with a finalizer is able to free that GCHandle (i.e. the GCHandle doesn't end up indirectly rooting itself). I'm slightly skeptical of my theory only because I'd have expected things to go bad in multiple ways, and not just with regards to this SafeHandle be used after disposal, but maybe it just takes so many things to be lined up correctly that we just don't see it that often.
@jaredpar, do you know if we have seen this at all in recent history other than in @eerhardt's example?
I took a peek:
WorkItems
| where Started > ago(30d)
| where QueueName =~ "osx.1013.amd64.open"
| where FriendlyName == "System.IO.FileSystem.Watcher.Tests"
| where ExitCode == 134
If we assume all SIGABRTs are this failure (which is worst-case), and checking out Kusto, this has only happened 15 times (out of ~5600 times this specific test has run in the past 30 days across pretty much every machine in OSX 10.13, there's never more than 2 in a single day. Whatever it is, it's able to repro on nearly every machine and is always around.
would it make sense to add some temporary state variable so we can track operations and verify the theory @stephentoub ? With two crashed per day we should be able to get core file and verification within reasonable time.
@MattGal @wfurt @stephentoub I ran the above query, checked console logs for a few result rows and found out at least 2 types of failures:
System.ObjectDisposedException: Safe handle has been closed mentioned aboveDebug.Assert failure in System.Text.RegularExpressions.RegexCharClassThat means there is more than 1 cause for the given ExitCode on osx.1013.amd64.open
Some Debug.Assert failure in System.Text.RegularExpressions.RegexCharClass
What is the assert? Was it recent? Was it on a PR related to regex or was it somewhere else?
That means there is more than 1 cause for the given ExitCode on osx.1013.amd64.open
Exit code 134 just means the process crashed, often because of an unhandled exception or fail fast from an assert. It maps to SIGABRT.
System.Diagnostics.DebugProvider+DebugAssertException: RegexCharClass s_propTable is out of order at (IsAlphabeticPresentationForms, IsArabic)
Was that on a PR, and if so, which one?
Yes, it failed on refs/pull/402/merge on 2020-05-10
I used this query
```
WorkItems
| where FriendlyName == "System.IO.FileSystem.Watcher.Tests"
and ExitCode == 134
and Started > ago(30d)
and QueueName =~ "osx.1013.amd64.open"
| join kind=inner Jobs on JobId
| order by Started desc
Thanks. That's a draft PR that's changing the implementation of string comparisons:
https://github.com/dotnet/runtime/pull/402
It's not a real issue to be investigating. Tons of tests are broken in that PR.
We've started hitting this ObjectDisposedException in AspNetCore
do you have core files @BrennanConroy ?
This seems like it could also be cause of https://github.com/dotnet/runtime/issues/37629. @stephentoub what do you think?
@ericstj Can we move this back to 5.0 please
This seems like it could also be cause of #37629. @stephentoub what do you think?
It could be. It's hard to say for sure, but this could be evidence of the portion of my comment at https://github.com/dotnet/runtime/issues/30056#issuecomment-631099004: "But then the FileSystemEventCallback associated with this RunningInstance is invoked (which is dubious if nothing is keeping alive the delegate that was handed out to the native function)".
@stephentoub I agree with your assessment, I think this should be fixed for reliability. @carlossanlop we should try to fit this in 5.0 if possible.
One trick we can use to repro this is force the scenario @stephentoub describes and manually invoke the GC + WaitForPendingFinalizers. This can force the case where we are executing instance method of collected object.
Failed again in https://github.com/dotnet/runtime/pull/44548
/private/tmp/helix/working/9DDF08A4/w/BF320A67/e /private/tmp/helix/working/9DDF08A4/w/BF320A67/e
Discovering: System.IO.FileSystem.Watcher.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.FileSystem.Watcher.Tests (found 150 of 183 test cases)
Starting: System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Unhandled exception. System.ObjectDisposedException: Safe handle has been closed.
Object name: 'SafeHandle'.
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs:line 148
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success) in /_/src/coreclr/src/System.Private.CoreLib/src/System/StubHelpers.cs:line 1255
at Interop.EventStream.FSEventStreamUnscheduleFromRunLoop(SafeEventStreamHandle streamRef, IntPtr runLoop, SafeCreateHandle runLoopMode)
at System.IO.FileSystemWatcher.RunningInstance.StaticWatcherRunLoopManager.UnscheduleFromRunLoop(SafeEventStreamHandle eventStream) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 221
at System.IO.FileSystemWatcher.RunningInstance.CleanupEventStream() in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 277
at System.IO.FileSystemWatcher.RunningInstance.FileSystemEventCallback(IntPtr streamRef, IntPtr clientCallBackInfo, IntPtr numEvents, Byte** eventPaths, FSEventStreamEventFlags* eventFlags, UInt64* eventIds) in /_/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.OSX.cs:line 353
./RunTests.sh: line 161: 22239 Abort trap: 6 (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.FileSystem.Watcher.Tests.runtimeconfig.json --depsfile System.IO.FileSystem.Watcher.Tests.deps.json xunit.console.dll
Plus instructions on how to debug it: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-44548-merge-776b26492e0f484d90/System.IO.FileSystem.Watcher.Tests/how-to-debug-dump.md?sv=2019-07-07&se=2020-12-01T20%3A53%3A20Z&sr=c&sp=rl&sig=%2F9zqfSrsTS2iGkGZYGFeCnbLFQn6fIF3Phbu0sxP3rg%3D
Is someone following up on the SafeHandle ObjectDisposedException? Happened again in https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-44835-merge-c7c5501b02fc4bc9b9/System.IO.FileSystem.Watcher.Tests/console.16d3d036.log?sv=2019-07-07&se=2020-12-07T23%3A20%3A58Z&sr=c&sp=rl&sig=MI6jI0MFjRG2yXR5cnPvbQflr6ldFxG5gB2o425JXYo%3D.
I'm planning to take a look but to is not not at the top priority right now. I downloaded the core dup @safern posted and briefly look at it - to verify I can.
Happened again with dump and all:
Looking at runfo and only into rolling builds, this failed in 3 builds in the last 14 days: https://runfo.azurewebsites.net/search/tests/?bq=definition%3Aruntime+started%3A%7E14+kind%3Arolling&tq=name%3A%22System.IO.FileSystem.Watcher.Tests%22
Most helpful comment
And a dump was produced: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-44548-merge-776b26492e0f484d90/System.IO.FileSystem.Watcher.Tests/core.22239?sv=2019-07-07&se=2020-12-01T20%3A53%3A20Z&sr=c&sp=rl&sig=%2F9zqfSrsTS2iGkGZYGFeCnbLFQn6fIF3Phbu0sxP3rg%3D
Plus instructions on how to debug it: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-44548-merge-776b26492e0f484d90/System.IO.FileSystem.Watcher.Tests/how-to-debug-dump.md?sv=2019-07-07&se=2020-12-01T20%3A53%3A20Z&sr=c&sp=rl&sig=%2F9zqfSrsTS2iGkGZYGFeCnbLFQn6fIF3Phbu0sxP3rg%3D