Runtime: CoreCLR fails to run when mlock is unavailable

Created on 25 Jun 2018 · 3Comments · Source: dotnet/runtime

CoreCLR uses mlock during startup and fails if mlock fails with EPERM. Generally, that's not a problem.

However, many Linux distributions are starting to use systemd-nspawn for building code. This creates a chroot where programs have restricted capabilities. Specifically they do not have CAP_IPC_LOCK, which means they can't use mlock.

Wwhen mlock doesn't work, coreclr fails to start. This shows up in an strace as something like:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd542bb000
mlock(0x7fbd542bb000, 4096)       = -1 EPERM (Operation not permitted)
write(2, "Failed to initialize CoreCLR, HR"..., 49) = 49

As a result, this makes it basically impossible to build coreclr in some Linux distribution build systems.

area-PAL-coreclr os-linux

Source

omajid

Most helpful comment

The mlock is necessary for proper behavior of the FlushProcessWriteBuffers PAL function that is crucial for ensuring reliable runtime suspension for GC. See https://github.com/dotnet/coreclr/blob/e6ebea25bea93eb4ec07cbd5003545c4805886a8/src/pal/src/thread/process.cpp#L3095-L3098 for description of the reason.
On Linux 4.3 and higher, there is a sys_membarrier syscall that we could use as an alternate mechanism to implement FlushProcessWriteBuffers. Issue dotnet/runtime#4501 is tracking that. @sdmaclea tried to implement it and tested it on ARM64 . He has found that the performance was really bad and that running time of our ~11000 coreclr tests was about 50% longer. However, no testing was done on other hardware, so it was not clear if the performance issue is ARM64 specific or an overall problem.
Interestingly enough, I've just discovered the following article describing performance issues with the sys_membarrier: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/. The reason is that the syscall internally waits until all running threads on the system have gone through a context switch, which could take tens of milliseconds. But the good news mentioned in this article is that starting with Linux 4.14, there is a new flag that can be passed to the sys_membarrier syscall and that makes it to use IPI to implement the memory barrier semantics. And that is much faster. So we should give it a try.