Runtime: Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04

Created on 30 Mar 2020  路  7Comments  路  Source: dotnet/runtime

Parallel creation of System.Threading.Mutex can fail when initiallyOwned is true, this causes a segmentation fault in libpthread.so which appears to be handled in libcoreclr.so but no managed exceptions are thrown, and the process fails, a SIGABRT is listed as the stop reason for thread #1.

Steps to reproduce

  1. Create a netcoreapp3.1 console application with the following Program.cs code:
using System;
using System.Threading;
using System.Threading.Tasks;

namespace repro
{
    class Program
    {
        static void Main(string[] args)
        {
            try {
                var t = 10000;

                Parallel.For(1, t, (i) => {
                    CreateMutex("t" + i.ToString());
                    CreateMutex("t" + (i-1).ToString());
                });
            } catch (Exception exception) {
                Console.WriteLine(exception);
            }
            finally {
                Console.WriteLine("here");
            }
        }

        private static void CreateMutex(string name)
        {
            using (var mutex = new Mutex(true, name))
            {
                Console.WriteLine($"mutex: {name}");
            }
        }
    }
}
  1. dotnet build -c Release
  2. export COMPlus_DbgEnableMiniDump=1
  3. run ./bin/Release/netcoreapp3.1/repro
  4. Observe an exit code of 139
    image
  5. Change Mutex ctor to use initallyOwned: false and observe that the code works OK

Expected behaviour

A managed exception is thrown when invalid Mutex access is attempted.

lldb output

When analyzing the resulting coredump in lldb thread #1 shows something like this:

* thread #1, name = 'repro', stop reason = signal SIGABRT
  * frame #0: 0x00007f59550a191a libpthread.so.0`__waitpid(pid=17428, stat_loc=0x00007f59523c044c, options=0) at waitpid.c:30:10
    frame #1: 0x00007f59543900fd libcoreclr.so`PROCCreateCrashDump(argv=0x00007f595468e5a0) at process.cpp:3346:22
    frame #2: 0x00007f595435d95d libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(code=<unavailable>, siginfo=<unavailable>, context=<unavailable>, signalRestarts=true) at signal.cpp:304:5
    frame #3: 0x00007f595435d90f libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007f59523c0af0, context=0x00007f59523c09c0) at signal.cpp:501
    frame #4: 0x00007f59550a1f40 libpthread.so.0`___lldb_unnamed_symbol1$$libpthread.so.0 + 1
    frame #5: 0x00007f595509ae24 libpthread.so.0`__pthread_mutex_unlock_full(mutex=0x00007f595012a008, decr=1) at pthread_mutex_unlock.c:149:7
    frame #6: 0x00007f5954384ebe libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] MutexHelpers::ReleaseLock(mutex=<unavailable>) at mutex.cpp:932:24
    frame #7: 0x00007f5954384eb6 libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::ActuallyReleaseLock() at mutex.cpp:1619
    frame #8: 0x00007f5954384e8b libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::Abandon() at mutex.cpp:1606
    frame #9: 0x00007f5954384e6e libcoreclr.so`NamedMutexProcessData::Close(this=0x00000000009d75f0, isAbruptShutdown=<unavailable>, releaseSharedData=true) at mutex.cpp:1294
    frame #10: 0x00007f59543821ff libcoreclr.so`SharedMemoryProcessDataHeader::Close(this=<unavailable>) at sharedmemory.cpp:927:17
    frame #11: 0x00007f5954381f8b libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::~SharedMemoryProcessDataHeader(this=0x00000000009d7550) at sharedmemory.cpp:868:5
    frame #12: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] void CorUnix::InternalDelete<SharedMemoryProcessDataHeader>(p=0x00000000009d7550) at malloc.hpp:148
    frame #13: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::DecRefCount(this=0x00000000009d7550) at sharedmemory.cpp:1028
    frame #14: 0x00007f5954381f7e libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(thread=<unavailable>, object=<unavailable>, isShuttingDown=<unavailable>, cleanUpPalSharedState=<unavailable>) at sharedmemory.cpp:813
    frame #15: 0x00007f5954375c86 libcoreclr.so`CorUnix::CPalObjectBase::ReleaseReference(this=0x00000000008e60a0, pthr=0x00000000008578a0) at palobjbase.cpp:309:13
    frame #16: 0x00007f5954368787 libcoreclr.so`CorUnix::CSimpleHandleManager::FreeHandle(this=<unavailable>, pThread=0x00000000008578a0, h=<unavailable>) at handlemgr.cpp:253:15
    frame #17: 0x00007f59543681ce libcoreclr.so`::CloseHandle(HANDLE) [inlined] CorUnix::InternalCloseHandle(pThread=0x00000000008578a0, hObject=0x0000000000000498) at handleapi.cpp:312:38
    frame #18: 0x00007f5954368188 libcoreclr.so`::CloseHandle(hObject=0x0000000000000498) at handleapi.cpp:287

Seems like the coreclr SIGSEGV handler is being called, so I'm not sure why there is no managed exception.

OS Details

NAME="Ubuntu"
VERSION="19.04 (Disco Dingo)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.04"
VERSION_ID="19.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco

.NET details

.NET Core SDK (reflecting any global.json):
 Version:   3.1.201
 Commit:    b1768b4ae7

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  19.04
 OS Platform: Linux
 RID:         ubuntu.19.04-x64
 Base Path:   /usr/share/dotnet/sdk/3.1.201/

Host (useful for support):
  Version: 3.1.3
  Commit:  4a9f85e9f8

.NET Core SDKs installed:
  2.2.207 [/usr/share/dotnet/sdk]
  2.2.402 [/usr/share/dotnet/sdk]
  3.0.103 [/usr/share/dotnet/sdk]
  3.1.201 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

Please let me know if there is any more information I can provide.

area-PAL-coreclr

Most helpful comment

Thanks @jburger for excellent clear bug report.

All 7 comments

so I'm not sure why there is no managed exception.

SIGSEGV is converted into managed exception only if it happens in managed code or a couple of thin helpers that are executed on behalf of the managed code. At all other places, SIGSEGV represents a bug in the native runtime / external shared libraries and so we fail fast instead. Converting it to a managed exception would be dangerous as we don't know what state was the current thread in. For example, if the sigsegv happened in a code running under a lock, handling the sigsegv would soon result in a deadlock.

I can confirm it repros on my Ubuntu 16.04 machine too.

Thanks @jburger for excellent clear bug report.

@jburger and @pawelpabich, could you please share some more information about how this was showing up originally and how the mutex was used? I see from the linked issue https://github.com/OctopusDeploy/Issues/issues/6287 it was showing up while writing to logs, and the issue mentioned that a bug was fixed, I'm also curious if anything was changed to work around the problem and if you are still seeing the issue from time to time.

@kouvel we were using a named mutex for locking concurrent writes to files based on the filename.
We moved away from mutexes and instead replaced it with ReaderWriterLockSlim

```c#
public class NamedLocks
{
readonly Dictionary refCountedLocks = new Dictionary();

    public IDisposable LockFor(string name)
    {
        RefCountedLock refCountedLock;

        lock (refCountedLocks)
        {
            if (!refCountedLocks.TryGetValue(name, out refCountedLock))
            {
                refCountedLock = new RefCountedLock(name, refCountedLocks);
                refCountedLocks[name] = refCountedLock;
            }

            refCountedLock.Acquire();
        }

        refCountedLock.Enter();

        return refCountedLock;
    }

    public int Count()
    {
        lock (refCountedLocks)
        {
            return refCountedLocks.Count;
        }
    }

    class RefCountedLock : IDisposable
    {
        readonly string name;
        readonly Dictionary<string, RefCountedLock> refCountedLocks;
        readonly ReaderWriterLockSlim @lock;

        int numberOfRefs;

        public RefCountedLock(string name, Dictionary<string, RefCountedLock> refCountedLocks)
        {
            this.name = name;
            this.refCountedLocks = refCountedLocks;

            @lock = new ReaderWriterLockSlim();

            numberOfRefs = 0;
        }

        public void Acquire()
        {
            numberOfRefs++;
        }

        public void Enter()
        {
            @lock.EnterWriteLock();
        }

        public void Dispose()
        {
            lock (refCountedLocks)
            {
                numberOfRefs--;
                if (numberOfRefs == 0)
                {
                    refCountedLocks.Remove(name);
                }
            }

            @lock.ExitWriteLock();

            if (numberOfRefs == 0)
            {
                @lock.Dispose();
            }
        }
    }
}

```

I see, thanks @johnsimons. Do you still need the ability to share the same lock across other processes, or is it just for synchronization within one process?

No, we only need it for the same process. It was just a convenience thing 馃榾

Was this page helpful?
0 / 5 - 0 ratings