In debug, RyuJIT seems to try to emit a 4-byte immediate on a shl instruction for the following case:
public class Program
{
static ushort s_2;
static short[] s_5 = new short[]{1};
static ulong s_8;
public static void Main()
{
var vr2 = s_5[0];
M9();
}
static void M9()
{
s_8 <<= (0 & s_2) + 186;
}
}
According to the JIT, this is the disassembly:
G_M23195_IG03:
90 nop
48C1255C29EEFFBA000000 shl qword ptr [reloc classVar[0x391252c0]], 186
90 nop
Of course shifts do not have a 4-byte immediate form, and VS knows better:
00007FFE51C915DE 90 nop
00007FFE51C915DF 48 C1 25 06 36 EE FF BA shl qword ptr [7FFE51B74BEDh],0BAh
00007FFE51C915E7 00 00 add byte ptr [rax],al
00007FFE51C915E9 00 90 48 8D 65 30 add byte ptr [rax+30658D48h],dl
This access violates depending on the value of rax.
I can take a look at this, I assume it shouldn't be too hard to fix.
Hmm, I see
48C1257329EEFF3A shl qword ptr [reloc classVar[0x1d874770]], 58
here.
Ah, this is debug/minopts. Scary bug.
Scary bug.
This bug is dedicated to everyone who has ever said that C# doesn't support inline assembly...
The problem is that lowering marks the immediate as contained because it is lower than 256:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/lowerxarch.cpp#L1694-L1712
However, since x86 instructions sign-extend immediates, the emitter believes it needs 4 bytes to represent values >= 128:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/emitxarch.cpp#L2143
Two potential solutions. Either lowering should not mark constants >= 128 as contained, or we can sign extend the value manually so the emitter only uses 1 byte. Probably the former is best since such shifts should be super rare anyway and it avoids introducing new code/changing trees in code-gen. Thoughts?
@jakobbotsch, for the HWIntrinsics (which have many instructions that only take an imm8), we are taking the IconValue and converting it to int8_t (after asserting that it is >= 0 && <= 255): https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsiccodegenxarch.cpp#L190-L192
I would think this same "fix" is also applicable here (or for any other instructions which only take imm8).
That is essentially what I meant by sign extending the value manually. The problem is that code-gen uses emitInsRMW for this shift, which takes a node and not the integer constant. So it seems to me that it would have to be something like replacing:
https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/jit/codegenxarch.cpp#L4161
by:
shiftBy->AsIntConCommon()->SetIconValue((char)shiftByValue);
getEmitter()->emitInsRMW(ins, attr, storeInd, shiftBy);
Unless there is some overload I am missing that can be used directly?
CC. @CarolEidt as well, since she was involved in several of the previous conversations around imm8 handling
The problem is that lowering marks the immediate as contained because it is lower than 256:
Sort of. That happens in other cases and things work correctly:
```c#
static int M9(int x)
{
x <<= (0 & s_2) + 186;
return x;
}
generates
```asm
8B4510 mov eax, dword ptr [rbp+10H]
C1E03A shl eax, 58
894510 mov dword ptr [rbp+10H], eax
8B4510 mov eax, dword ptr [rbp+10H]
Either lowering should not mark constants >= 128 as contained, or we can sign extend the value manually so the emitter only uses 1 byte.
Both are valid solutions but they don't address the actual problem.
The problem is that code-gen uses emitInsRMW for this shift
Indeed, there's a problem with emitInsRMW itself. That's why it only fails in some cases.
Indeed, there's a problem with emitInsRMW itself. That's why it only fails in some cases.
I suppose emitInsRMW could check whether we are emitting a shift and mask it, similarly to how emitIns_R_I handles it.
I suppose emitInsRMW could check whether we are emitting a shift and mask it, similarly to how emitIns_R_I handles it.
Yup. It's a bit strange, I would have expected the actual encoding functions (emitOutputX) to take care of this but alas that's how it works now.
Sure, you can make lowering drop the unneeded bits (it really should mask with 31 for 32 bit shifts and 64 for 64 bit shifts instead of containing only values <= 255) but then it would still be possible for someone to somehow by pass lowering and generate invalid instructions (e.g. produce nodes on the fly in codegen, it's rare but it can happen, at least in theory).
it really should mask with 31 for 32 bit shifts and 64 for 64 bit shifts instead of containing only values <= 255
Well, assuming all the places that emit these instructions apply the masking, any integer constant should be able to be contained. And the places that don't already mask have this problem for constants between 128 and 256. So it should be safe to remove those bounds checks completely and fix the masking, I think.
Well, assuming all the places that emit these instructions apply the masking, any integer constant should be able to be contained.
Indeed, if the emitter does its job correctly then that condition in lowering is useless.
Indeed, if the emitter does its job correctly then that condition in lowering is useless.
Right, it would seem to me that doing the "right thing" in the emitter would be the best solution.
It seems a little strange to do it in the emitter because at that point we have already decided what format of instruction to emit. It means we end up producing a longer form than necessary once the masked value turn out to be 1. For example with the current state of things we have:
x <<= 1;
x <<= (0 & s_2) + 129;
generates:
D1E0 shl eax, 1
894510 mov dword ptr [rbp+10H], eax
8B4510 mov eax, dword ptr [rbp+10H]
C1E001 shl eax, 1
Anyway, I tried cleaning up the lowering and it resulted in some assert and test failures. I will probably submit a PR that fixes this issue just by masking in emitInsRMW, and then take a look at how to clean up the other stuff separately.
C1E001 shl eax, 1
Eh, it's probably the job of the emitter to pick up the right encoding. It probably doesn't do it because the 2 shl variants have different base encodings, that doesn't quite fit in the current instruction tables so we ended up with pseudo-instructions such as shl_N and shl_1.
Of course, there's no harm in masking the shift count in lowering. But the number of real world cases where this has any impact is very likely 0. Keep in mind that shift counts greater than the bit count are unspecified in IL. At the same time, C# guarantees that the shift count is appropriately masked.
Eh, it's probably the job of the emitter to pick up the right encoding.
Fair enough, it was mainly that the masking is done in emitIns_R_I, so it seems strange not to emit a register and immediate. For example, it would make more sense to me to do the masking in CodeGen::inst_RV_SH than the emitter. But that could probably be considered part of the emitter...
Most helpful comment
This bug is dedicated to everyone who has ever said that C# doesn't support inline assembly...