Runtime: RyuJIT: Incorrect 4-byte immediate emitted for shift causes access violation

Created on 22 Aug 2018  路  17Comments  路  Source: dotnet/runtime

In debug, RyuJIT seems to try to emit a 4-byte immediate on a shl instruction for the following case:

public class Program
{
    static ushort s_2;
    static short[] s_5 = new short[]{1};
    static ulong s_8;
    public static void Main()
    {
        var vr2 = s_5[0];
        M9();
    }

    static void M9()
    {
        s_8 <<= (0 & s_2) + 186;
    }
}

According to the JIT, this is the disassembly:

G_M23195_IG03:
       90                   nop
       48C1255C29EEFFBA000000 shl      qword ptr [reloc classVar[0x391252c0]], 186
       90                   nop

Of course shifts do not have a 4-byte immediate form, and VS knows better:

00007FFE51C915DE 90                   nop  
00007FFE51C915DF 48 C1 25 06 36 EE FF BA shl         qword ptr [7FFE51B74BEDh],0BAh  
00007FFE51C915E7 00 00                add         byte ptr [rax],al  
00007FFE51C915E9 00 90 48 8D 65 30    add         byte ptr [rax+30658D48h],dl  

This access violates depending on the value of rax.

area-CodeGen-coreclr bug

Most helpful comment

Scary bug.

This bug is dedicated to everyone who has ever said that C# doesn't support inline assembly...

All 17 comments

I can take a look at this, I assume it shouldn't be too hard to fix.

Hmm, I see

       48C1257329EEFF3A     shl      qword ptr [reloc classVar[0x1d874770]], 58

here.

Ah, this is debug/minopts. Scary bug.

Scary bug.

This bug is dedicated to everyone who has ever said that C# doesn't support inline assembly...

The problem is that lowering marks the immediate as contained because it is lower than 256:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/lowerxarch.cpp#L1694-L1712
However, since x86 instructions sign-extend immediates, the emitter believes it needs 4 bytes to represent values >= 128:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/emitxarch.cpp#L2143

Two potential solutions. Either lowering should not mark constants >= 128 as contained, or we can sign extend the value manually so the emitter only uses 1 byte. Probably the former is best since such shifts should be super rare anyway and it avoids introducing new code/changing trees in code-gen. Thoughts?

@jakobbotsch, for the HWIntrinsics (which have many instructions that only take an imm8), we are taking the IconValue and converting it to int8_t (after asserting that it is >= 0 && <= 255): https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsiccodegenxarch.cpp#L190-L192

I would think this same "fix" is also applicable here (or for any other instructions which only take imm8).

That is essentially what I meant by sign extending the value manually. The problem is that code-gen uses emitInsRMW for this shift, which takes a node and not the integer constant. So it seems to me that it would have to be something like replacing:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/codegenxarch.cpp#L4161
by:

shiftBy->AsIntConCommon()->SetIconValue((char)shiftByValue);
getEmitter()->emitInsRMW(ins, attr, storeInd, shiftBy);

Unless there is some overload I am missing that can be used directly?

CC. @CarolEidt as well, since she was involved in several of the previous conversations around imm8 handling

The problem is that lowering marks the immediate as contained because it is lower than 256:

Sort of. That happens in other cases and things work correctly:
```c#
static int M9(int x)
{
x <<= (0 & s_2) + 186;
return x;
}

generates
```asm
       8B4510               mov      eax, dword ptr [rbp+10H]
       C1E03A               shl      eax, 58
       894510               mov      dword ptr [rbp+10H], eax
       8B4510               mov      eax, dword ptr [rbp+10H]

Either lowering should not mark constants >= 128 as contained, or we can sign extend the value manually so the emitter only uses 1 byte.

Both are valid solutions but they don't address the actual problem.

The problem is that code-gen uses emitInsRMW for this shift

Indeed, there's a problem with emitInsRMW itself. That's why it only fails in some cases.

Indeed, there's a problem with emitInsRMW itself. That's why it only fails in some cases.

I suppose emitInsRMW could check whether we are emitting a shift and mask it, similarly to how emitIns_R_I handles it.

I suppose emitInsRMW could check whether we are emitting a shift and mask it, similarly to how emitIns_R_I handles it.

Yup. It's a bit strange, I would have expected the actual encoding functions (emitOutputX) to take care of this but alas that's how it works now.

Sure, you can make lowering drop the unneeded bits (it really should mask with 31 for 32 bit shifts and 64 for 64 bit shifts instead of containing only values <= 255) but then it would still be possible for someone to somehow by pass lowering and generate invalid instructions (e.g. produce nodes on the fly in codegen, it's rare but it can happen, at least in theory).

it really should mask with 31 for 32 bit shifts and 64 for 64 bit shifts instead of containing only values <= 255

Well, assuming all the places that emit these instructions apply the masking, any integer constant should be able to be contained. And the places that don't already mask have this problem for constants between 128 and 256. So it should be safe to remove those bounds checks completely and fix the masking, I think.

Well, assuming all the places that emit these instructions apply the masking, any integer constant should be able to be contained.

Indeed, if the emitter does its job correctly then that condition in lowering is useless.

Indeed, if the emitter does its job correctly then that condition in lowering is useless.

Right, it would seem to me that doing the "right thing" in the emitter would be the best solution.

It seems a little strange to do it in the emitter because at that point we have already decided what format of instruction to emit. It means we end up producing a longer form than necessary once the masked value turn out to be 1. For example with the current state of things we have:

x <<= 1;
x <<= (0 & s_2) + 129;

generates:

D1E0                 shl      eax, 1
894510               mov      dword ptr [rbp+10H], eax
8B4510               mov      eax, dword ptr [rbp+10H]
C1E001               shl      eax, 1

Anyway, I tried cleaning up the lowering and it resulted in some assert and test failures. I will probably submit a PR that fixes this issue just by masking in emitInsRMW, and then take a look at how to clean up the other stuff separately.

C1E001 shl eax, 1

Eh, it's probably the job of the emitter to pick up the right encoding. It probably doesn't do it because the 2 shl variants have different base encodings, that doesn't quite fit in the current instruction tables so we ended up with pseudo-instructions such as shl_N and shl_1.

Of course, there's no harm in masking the shift count in lowering. But the number of real world cases where this has any impact is very likely 0. Keep in mind that shift counts greater than the bit count are unspecified in IL. At the same time, C# guarantees that the shift count is appropriately masked.

Eh, it's probably the job of the emitter to pick up the right encoding.

Fair enough, it was mainly that the masking is done in emitIns_R_I, so it seems strange not to emit a register and immediate. For example, it would make more sense to me to do the masking in CodeGen::inst_RV_SH than the emitter. But that could probably be considered part of the emitter...

Was this page helpful?
0 / 5 - 0 ratings