Runtime: CoreCLR runtime doesn't work on Linux kernel 4.6.x

Created on 27 Jun 2016 · 8Comments · Source: dotnet/runtime

We get SIGSEGV in the GC when trying to allocate managed objects.

bug

Source

janvorli

❤2 👍1

Most helpful comment

It looks like the problem is caused by the thread suspend injection on Unix. Disabling it by setting COMPlus_INTERNAL_ThreadSuspendInjection environment variable to 0 makes the issue go away.

janvorli on 28 Jun 2016

👍4 🎉2

All 8 comments

It looks like the problem is caused by the thread suspend injection on Unix. Disabling it by setting COMPlus_INTERNAL_ThreadSuspendInjection environment variable to 0 makes the issue go away.

janvorli on 28 Jun 2016

👍4 🎉2

I have also tried to install older versions of the kernel to see when exactly the issue started to happen. The 4.6.0 was the first version with this problem.

janvorli on 28 Jun 2016

Is this SIGSEGV directly related with any specific system-call or kernel-userspace ABIs?

If we have a short and simple example code to reenact the bug, we might be able to let the linux kernel fix the issue if this is a regression in the kernel, not a bug of coreclr.

myungjoo on 28 Jun 2016

I have just found the culprit - it is a bug in coreclr that didn't show up on older kernels. The issue is that when we translate unix context to the windows style context in CONTEXTFromNativeContext / CONTEXTToNativeContext on AMD64, we also translate the "CS" - code segment - register. But the macro to access the register in the unix context for AMD64 is defined as follows:
#define MCREG_SegCs(mc) ((mc).gregs[REG_CSGSFS])
As you can see, the gregs contains combined CS, GS and FS segment registers, each taking 16 bits. But the field in the windows context for CS is just 16 bits. So when we translate the unix context to windows context, we set just the CS and it is ok. But when we translate it back from the windows context to the unix context, we set the CS correctly, but clear the GS and FS values!
I have discovered it by comparing the unix context contents after entering inject_activation_handler and after restoring it from the windows context before returning from the handler:
At the beginning:

  uc_mcontext = {
    gregs = {
      ...
      [18] = 0x0003000000000033
      ...
    }

After the restore:

  uc_mcontext = {
    gregs = {
      ...
      [18] = 0x0000000000000033
      ...
    }

I have also found why it didn't show up on the older kernels. When running e.g. on the kernel 3.13 that's the default kernel for Ubuntu 14.04, the FS and GS values are both zero.

janvorli on 28 Jun 2016

👍5

FYI: @adityamandaleeka

janvorli on 28 Jun 2016

A small correction - the FS and GS are still zero on the newer kernel, but the topmost 16 bits that are marked as padding in a linux header are not. It can be seen from the values I've dumped above, I just said by mistake that the issue was in clearing the FS and GS. Obviously, the new kernel uses those 16 bits for something.

janvorli on 28 Jun 2016

I was looking through the changes in the 4.6 kernel and noticed this commit which replaced the padding field in the sigcontext with a union of padding and SS.