Component: comptest (fvtest/compilertriltest/LinkageTest.cpp)
The TYPED_TEST(LinkageTest, SystemLinkageParameterPassingFourArg) test fails (just stops execution) when OMR are compiled with MSVC and passes when OMR are compiled with Clang. The single parameter test passed as it's expected.
The last string in the test log:
[----------] 3 tests from LinkageTest/0, where TypeParam = <type>
Invoke the fourthArg function from a JITed by OMR one
MSVC
Unit test:
00007ff7`fbfaa653 41b900040000 mov r9d,400h
00007ff7`fbfaa659 4533c0 xor r8d,r8d
00007ff7`fbfaa65c 33d2 xor edx,edx
00007ff7`fbfaa65e 33c9 xor ecx,ecx
00007ff7`fbfaa660 ff942468030000 call qword ptr [rsp+368h]
JITed function:
0000019c`d6d30034 4153 push r11
0000019c`d6d30036 49bb37d8d4fbf77f0000 mov r11,offset comptest!ILT+51250(??$fourthArgHYAHHHHHZ) (00007ff7`fbd4d837)
0000019c`d6d30040 41ffd3 call r11
0000019c`d6d30043 415b pop r11
0000019c`d6d30045 c3 ret
(no one saves non-volatile registers and stack frame)
Template handler (MSVC specific):
comptest!ILT+51250(??$fourthArgHYAHHHHHZ):
00007ff7`fbd4d837 e904332500 jmp comptest!fourthArg<int> (00007ff7`fbfa0b40)
comptest!fourthArg
comptest!fourthArg<int>:
00007ff7`fbfa0b40 44894c2420 mov dword ptr [rsp+20h],r9d ss:00000004`0473f1d8=00000000
00007ff7`fbfa0b45 4489442418 mov dword ptr [rsp+18h],r8d
00007ff7`fbfa0b4a 89542410 mov dword ptr [rsp+10h],edx
00007ff7`fbfa0b4e 894c2408 mov dword ptr [rsp+8],ecx
00007ff7`fbfa0b52 57 push rdi
00007ff7`fbfa0b53 8b442428 mov eax,dword ptr [rsp+28h]
00007ff7`fbfa0b57 5f pop rdi
00007ff7`fbfa0b58 c3 ret
The compiled with MSVC function (comptest!fourthArg<int>) doesn't touch the stack frame pointer, saves nonvolatile registers on the stack but doesn't deallocate the fixed part of the stack. As I see, it even doesn't restore nonvolatile registers, perhaps because it "understands" it doesn't call any functions (more strictly, doesn't change any nonvolatile registers, so, it's a leaf function) and therefore the registers won't change their values.
For comparison, Clang
Unit test:
00007ff6`bfdaba99 4c8b95f0010000 mov r10,qword ptr [rbp+1F0h] ss:0000005a`a251f420=000001ec60670034
00007ff6`bfdabaa0 41b900040000 mov r9d,400h
00007ff6`bfdabaa6 8b8dd4000000 mov ecx,dword ptr [rbp+0D4h]
00007ff6`bfdabaac 8b95d4000000 mov edx,dword ptr [rbp+0D4h]
00007ff6`bfdabab2 448b85d4000000 mov r8d,dword ptr [rbp+0D4h]
00007ff6`bfdabab9 8985d0000000 mov dword ptr [rbp+0D0h],eax
00007ff6`bfdababf 41ffd2 call r10
JITed function:
000001ec`60670034 4153 push r11
000001ec`60670036 49bbd0c3dabff67f0000 mov r11,offset comptest!fourthArg<int> (00007ff6`bfdac3d0)
000001ec`60670040 41ffd3 call r11
000001ec`60670043 415b pop r11
000001ec`60670045 c3 ret
comptest!fourthArg
comptest!fourthArg<int>:
00007ff6`bfdac3d0 4883ec10 sub rsp,10h
00007ff6`bfdac3d4 44894c240c mov dword ptr [rsp+0Ch],r9d
00007ff6`bfdac3d9 4489442408 mov dword ptr [rsp+8],r8d
00007ff6`bfdac3de 89542404 mov dword ptr [rsp+4],edx
00007ff6`bfdac3e2 890c24 mov dword ptr [rsp],ecx
00007ff6`bfdac3e5 8b44240c mov eax,dword ptr [rsp+0Ch]
00007ff6`bfdac3e9 4883c410 add rsp,10h
00007ff6`bfdac3ed c3 ret
The compiled with Clang function (comptest!fourthArg<int>) saves rsp in its prologue and just restores it in the epilogue even though the function is a leaf one.
I'm not sure but this can be a problem in the MSVC implementation: it touches stack but doesn't restore the stack state (doesn't deallocate the used part of the stack) as it's described in the official documentation: Prolog and Epilog.
There is a ambiguity: some MSDN pages says "Called function pops the arguments from the stack." while others "The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space to store four register parameters, even if the callee doesn鈥檛 take that many parameters.". If we have a look at the code for a wrapper function bellow, we will ought to get the second sentence (the callers must allocate sufficient space on the stack). The OMR generated code:
000001ec`60670034 4153 push r11
000001ec`60670036 49bbd0c3dabff67f0000 mov r11,offset comptest!fourthArg<int> (00007ff6`bfdac3d0)
000001ec`60670040 41ffd3 call r11
000001ec`60670043 415b pop r11
000001ec`60670045 c3 ret
doesn't allocate any space on the stack but puts another return pointer there when do call. So, all relative addresses is shifted on one qword (64 bytes, the size of the pointer on 64-bit system) and when comptest!fourthArg<int> (the callee) saves its parameters on the stack, it corruptes the return pointer.
Let's have a look at what MSVC and Clang generate for a wrapper function (so, let's call comptest!fourthArg<int> from a wrapper compiled with MSVC or Clang, not with OMR JIT).
MSVC
comptest!ILT+228240(?myWrapperYAHHHHHZ):
00007ff7`65c38b95 e966622200 jmp comptest!myWrapper (00007ff7`65e5ee00)
comptest!myWrapper:
00007ff7`65e5ee00 44894c2420 mov dword ptr [rsp+20h],r9d ss:00000027`675ef278=675ef67c
00007ff7`65e5ee05 4489442418 mov dword ptr [rsp+18h],r8d
00007ff7`65e5ee0a 89542410 mov dword ptr [rsp+10h],edx
00007ff7`65e5ee0e 894c2408 mov dword ptr [rsp+8],ecx
00007ff7`65e5ee12 57 push rdi
00007ff7`65e5ee13 4883ec20 sub rsp,20h ; we are preparing the stack for callee?
00007ff7`65e5ee17 488bfc mov rdi,rsp
00007ff7`65e5ee1a b908000000 mov ecx,8
00007ff7`65e5ee1f b8cccccccc mov eax,0CCCCCCCCh
00007ff7`65e5ee24 f3ab rep stos dword ptr [rdi]
00007ff7`65e5ee26 8b4c2430 mov ecx,dword ptr [rsp+30h]
00007ff7`65e5ee2a 448b4c2448 mov r9d,dword ptr [rsp+48h]
00007ff7`65e5ee2f 448b442440 mov r8d,dword ptr [rsp+40h]
00007ff7`65e5ee34 8b542438 mov edx,dword ptr [rsp+38h]
00007ff7`65e5ee38 8b4c2430 mov ecx,dword ptr [rsp+30h]
00007ff7`65e5ee3c e8f6e9daff call comptest!ILT+51250(??$fourthArgHYAHHHHHZ) (00007ff7`65c0d837)
00007ff7`65e5ee41 4883c420 add rsp,20h ; and restores it?
00007ff7`65e5ee45 5f pop rdi
00007ff7`65e5ee46 c3 ret
The caller prepares the stack for the callee and the callee is able not to restore the stack state itself. Extra parameters <---> stack copies are also present.
Clang
comptest!myWrapper:
00007ff7`fa1afe20 4883ec38 sub rsp,38h
00007ff7`fa1afe24 44894c2434 mov dword ptr [rsp+34h],r9d
00007ff7`fa1afe29 4489442430 mov dword ptr [rsp+30h],r8d
00007ff7`fa1afe2e 8954242c mov dword ptr [rsp+2Ch],edx
00007ff7`fa1afe32 894c2428 mov dword ptr [rsp+28h],ecx
00007ff7`fa1afe36 448b4c2434 mov r9d,dword ptr [rsp+34h]
00007ff7`fa1afe3b 448b442430 mov r8d,dword ptr [rsp+30h]
00007ff7`fa1afe40 8b54242c mov edx,dword ptr [rsp+2Ch]
00007ff7`fa1afe44 8b4c2428 mov ecx,dword ptr [rsp+28h]
00007ff7`fa1afe48 e8c30f0000 call comptest!fourthArg<int> (00007ff7`fa1b0e10)
00007ff7`fa1afe4d 90 nop
00007ff7`fa1afe4e 4883c438 add rsp,38h
00007ff7`fa1afe52 c3 ret
The compiler allocates the stack size before call comptest!fourthArg<int> instruction and then deallocates it back.
FYI @0dvictor @andrewcraik @0xdaryl
@0dvictor was actually looking at a problem with the 64bit windows calling convention for the C helpers and this may be related - in that case we weren't buying the 4 stack slots required under the linkage. I'll let @0dvictor check the details of this in that context.
Yes, I'm working on the same issue on OpenJ9's JIT helper calls. Let me fix this bug as well.
It turned not only a calling convention violation on Windows. Stack space for arguments are not allocated on neither Linux nor Windows. I guess none of existing tests have parameters being passed via stack, but on Windows, the MSVC compiler is able to take advantage of the register parameter stack area for optimizations. A fix is coming.
@0dvictor The problem is reproduced only when OMR is built in the Debug mode, so MSVC as well as LLVM pushes parameters from registers to the stack and corrupts the return address (LLVM allocates some stack size in the callee as well and we see no problem). I've tried to reproduce the problem in a production build and added a test to the LinkageTest testcase (the PR is comming).
Also, on Windows x64, mixed type arguments don't work, so if a JITed method calls a functions like double (*)(double, int, double, double) or int (*)(int,double,int,double,int), SEH exception 0xc0000005 is thrown. Maybe it has a place because on Windows we can't use more than 4 registers (either int or float) for parameter passing while on Linux we have 6 registers for integers and 8 for floats, but there I pass 4 parameters only...
@samolisov Thank you for bring it up. I noticed these limitations as well. I'll make sure they are all fixed.
@0dvictor Thank you for your comment. I've open #3389 that demonstrates what happen when there are more than four parameters and if the callee is an actively stack user.
I'm not sure whether OMR must support platform-specific calling convention between methods JITed by itself. So on Windows we have only four registers for parameters regardless of their types (so, even if we have 4 integers and 4 floats in the signature, only 4 in total can be placed in registers) and it is a big limitation. System V AMD64 ABI lets us use 14 registers in total (6 integer and 8 float). What if OMR will use the System V AMD64 calling convention even on Windows and use Microsoft's one only for entry points and methods which may call one or more native methods (like in JNI)?
@samolisov It's an interesting question. I'm not sure if OMR should go this way, but I know some downstream project (OpenJ9) uses customized calling convention for its JITed-to-JITed method calls.
IMO, using a different calling convention is kind of an inter-procedure optimization. Theoretically we are able to use a customized calling convention if and only if following two hold:
In all other cases where Caller or Callee may be an arbitrary method (let's say an C function), OMR must use the platform's standard calling convention.
Most helpful comment
@0dvictor was actually looking at a problem with the 64bit windows calling convention for the C helpers and this may be related - in that case we weren't buying the 4 stack slots required under the linkage. I'll let @0dvictor check the details of this in that context.