Reported by @BrennanConroy
We’re getting this exception in a test occasionally.
Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.Reflection.Internal.MemoryBlock.PeekCompressedInteger(Int32 offset, Int32& numberOfBytesRead)
at System.Reflection.Metadata.Ecma335.BlobHeap.GetMemoryBlock(BlobHandle handle)
at System.Reflection.Metadata.MethodDebugInformation.GetSequencePoints()
at System.Diagnostics.StackTraceSymbols.GetSourceLineInfoWithoutCasAssert(String assemblyPath, IntPtr loadedPeAddress, Int32 loadedPeSize, IntPtr inMemoryPdbAddress, Int32 inMemoryPdbSize, Int32 methodToken, Int32 ilOffset, String& sourceFile, Int32& sourceLine, Int32& sourceColumn)
at System.Diagnostics.StackFrameHelper.InitializeSourceInfo(Int32 iSkip, Boolean fNeedFileInfo, Exception exception)
at System.Diagnostics.StackTrace.CaptureStackTrace(Int32 iSkip, Boolean fNeedFileInfo, Thread targetThread, Exception e)
at System.Diagnostics.StackTrace..ctor(Exception e, Boolean fNeedFileInfo)
at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
at System.Exception.GetStackTrace(Boolean needFileInfo)
at System.Exception.ToString(Boolean needFileLineInfo, Boolean needMessage)
at System.Exception.ToString()
From @jkotas: If you lose the race here:
// This may fail as another thread might have beaten us to it, but it doesn't matter
_metadataCache.TryAdd(cacheKey, provider);
if (provider == null)
{
return null;
}
// The reader has already been open, so this doesn't throw:
return provider.GetMetadataReader();
}
There is nothing going to keep the provider alive.
From @tmat: I think you are right. We should return the provider that’s already in the cache, not the new provider.
Fixed in master (PR dotnet/corefx#31991) and 2.2 branch (PR dotnet/corefx#32016)
@mikem8361 Do you plan to fix in .NET Framework 4.8 as well?
We will see if ship room thinks this is important enough to get into desktop. Personally I hope to avoid any desktop work; maybe we can get someone else more familiar with the process.
/cc: @tommcdon
Jan opened https://devdiv.visualstudio.com/DevDiv/_workitems/edit/672340 for the desktop.
Well we only actually ever saw this issue on desktop... so it would be nice to get the fix there.
Reactivating for 2.2 consideration.
cc @tommcdon ?
There is a race condition in the portable PDB metadata provider cache that leaked providers and caused crashes in the diagnostic StackTrace API.
To fix the race, detect the cause where the provider wasn't being disposed and dispose it.
Without this fix, the StackTrace API can crash.
No.
Low.
@danmosemsft The Shiproom template LGTM. Yes we agree that this race condition should be fixed in 2.2.
Shiproom Okayed this for 2.2, but deferred for 2.1.
@mikem8361 @danmosemsft Fyi I believe this to be the underlying bug for a longtime problem with intermittent CI hangs in our NUnit tests. The AccessViolationException takes out an NUnit worker thread which causes NUnit to hang while waiting for it.
I spent roughly 17 hours digging at this so far. Here's where it gets interesting: https://github.com/nunit/nunit/issues/3295#issuecomment-506976720
The best repro I have had time to get so far: https://github.com/jnm2/NUnitHangRepro#how-to-use
Ah, should have asked explicitly: would you please port this fix to .NET Framework 4.7.2 and 4.8? I'm going to do my best to harden NUnit to the bug, but even so it's not going to be a good user experience to sporadically not have stack traces in the test results.
@mikem8361 @tommcdon for the question about backporting.
@jnm2 Thanks for taking the time to report this issue and work towards NUnit to this issue. After careful consideration, we do not plan to action this particular item in the .NET framework 4.7.2 and 4.8 servicing releases. If this issue is blocking you from production, there are alternative assisted support options available from Microsoft.
@tommcdon That's disappointing and I have to admit it leaves me curious, but thank you for considering. There are a number of other folks using NUnit and coming at this same .NET use-after-free bug from various directions. Is the rationale that .NET Framework 4.8 is intentionally less supported to nudge folks to .NET Core/.NET 5? That would be helpful to know as NUnit plans around this bug and as something we can pass on to folks who are affected by this.
@jnm2 We continue to both service and support .NET Framework, which includes bug–, reliability– and security fixes. https://devblogs.microsoft.com/dotnet/net-core-is-the-future-of-net/
This issue is both reliability and security use-after-free bug as you have pointed out. I am asking folks to reconsider.
@jkotas That's reassuring! I'm eager to hear what happens. Ideally we can document which security rollup this gets fixed in.
@jkotas Since this issue is closed and I'm afraid of it falling off the radar, would it be a good idea for me to open a new issue?
.NET Framework bugs are not tracked on github. They are tracked in internal bug database. This specific bug is tracked as https://devdiv.visualstudio.com/DevDiv/_workitems/edit/672340
https://github.com/dotnet/core#getting-help has instruction for opening .NET Framework issues. If you would like to have something you can see and monitor progress from outside, you can open a .NET Framework issue there.
@jkotas Would it be possible to have the status of the .NET Framework bug? This is a major issue for us, randomly and abruptly killing our production services.
@mikem8361 is working on it. I am sorry it has been delayed due to summer vacations.
I was just debating whether I should pester you again over the weekend. Thanks @ocoanet for doing it for me.
The .NET Framework fix is scheduled for November update.
@jkotas Just to confirm, was it actually shipped in the February update?
November's update:
https://devblogs.microsoft.com/dotnet/net-framework-november-13-2019-update-for-net-framework-4-8/
February's update:
https://devblogs.microsoft.com/dotnet/net-framework-february-2020-security-and-quality-rollup/
CLR
- There is a race condition in the portable PDB metadata provider cache that leaked providers and caused crashes in the diagnostic StackTrace API. To fix the race, detect the cause where the provider wasn’t being disposed and dispose it.
Yes. Our release team had to adjust the schedule for this.
No worries, just wanted to have a number in mind to spread! Thanks again!
Most helpful comment
@jnm2 We continue to both service and support .NET Framework, which includes bug–, reliability– and security fixes. https://devblogs.microsoft.com/dotnet/net-core-is-the-future-of-net/
This issue is both reliability and security use-after-free bug as you have pointed out. I am asking folks to reconsider.