Omr: Calling convention violation on X86_64/Windows

Created on 18 Dec 2018  路  8Comments  路  Source: eclipse/omr

Component: comptest (fvtest/compilertriltest/LinkageTest.cpp)

The TYPED_TEST(LinkageTest, SystemLinkageParameterPassingFourArg) test fails (just stops execution) when OMR are compiled with MSVC and passes when OMR are compiled with Clang. The single parameter test passed as it's expected.

The last string in the test log:

[----------] 3 tests from LinkageTest/0, where TypeParam = <type>

Some debug information

Invoke the fourthArg function from a JITed by OMR one

MSVC

Unit test:

00007ff7`fbfaa653 41b900040000    mov     r9d,400h
00007ff7`fbfaa659 4533c0          xor     r8d,r8d
00007ff7`fbfaa65c 33d2            xor     edx,edx
00007ff7`fbfaa65e 33c9            xor     ecx,ecx
00007ff7`fbfaa660 ff942468030000  call    qword ptr [rsp+368h]

JITed function:

0000019c`d6d30034 4153            push    r11
0000019c`d6d30036 49bb37d8d4fbf77f0000 mov r11,offset comptest!ILT+51250(??$fourthArgHYAHHHHHZ) (00007ff7`fbd4d837)
0000019c`d6d30040 41ffd3          call    r11
0000019c`d6d30043 415b            pop     r11
0000019c`d6d30045 c3              ret

(no one saves non-volatile registers and stack frame)

Template handler (MSVC specific):

comptest!ILT+51250(??$fourthArgHYAHHHHHZ):
00007ff7`fbd4d837 e904332500      jmp     comptest!fourthArg<int> (00007ff7`fbfa0b40)

comptest!fourthArg:

comptest!fourthArg<int>:
00007ff7`fbfa0b40 44894c2420      mov     dword ptr [rsp+20h],r9d ss:00000004`0473f1d8=00000000
00007ff7`fbfa0b45 4489442418      mov     dword ptr [rsp+18h],r8d
00007ff7`fbfa0b4a 89542410        mov     dword ptr [rsp+10h],edx
00007ff7`fbfa0b4e 894c2408        mov     dword ptr [rsp+8],ecx
00007ff7`fbfa0b52 57              push    rdi
00007ff7`fbfa0b53 8b442428        mov     eax,dword ptr [rsp+28h]
00007ff7`fbfa0b57 5f              pop     rdi
00007ff7`fbfa0b58 c3              ret

The compiled with MSVC function (comptest!fourthArg<int>) doesn't touch the stack frame pointer, saves nonvolatile registers on the stack but doesn't deallocate the fixed part of the stack. As I see, it even doesn't restore nonvolatile registers, perhaps because it "understands" it doesn't call any functions (more strictly, doesn't change any nonvolatile registers, so, it's a leaf function) and therefore the registers won't change their values.

For comparison, Clang

Unit test:

00007ff6`bfdaba99 4c8b95f0010000  mov     r10,qword ptr [rbp+1F0h] ss:0000005a`a251f420=000001ec60670034
00007ff6`bfdabaa0 41b900040000    mov     r9d,400h
00007ff6`bfdabaa6 8b8dd4000000    mov     ecx,dword ptr [rbp+0D4h]
00007ff6`bfdabaac 8b95d4000000    mov     edx,dword ptr [rbp+0D4h]
00007ff6`bfdabab2 448b85d4000000  mov     r8d,dword ptr [rbp+0D4h]
00007ff6`bfdabab9 8985d0000000    mov     dword ptr [rbp+0D0h],eax
00007ff6`bfdababf 41ffd2          call    r10

JITed function:

000001ec`60670034 4153            push    r11
000001ec`60670036 49bbd0c3dabff67f0000 mov r11,offset comptest!fourthArg<int> (00007ff6`bfdac3d0)
000001ec`60670040 41ffd3          call    r11
000001ec`60670043 415b            pop     r11
000001ec`60670045 c3              ret

comptest!fourthArg:

comptest!fourthArg<int>:
00007ff6`bfdac3d0 4883ec10        sub     rsp,10h
00007ff6`bfdac3d4 44894c240c      mov     dword ptr [rsp+0Ch],r9d
00007ff6`bfdac3d9 4489442408      mov     dword ptr [rsp+8],r8d
00007ff6`bfdac3de 89542404        mov     dword ptr [rsp+4],edx
00007ff6`bfdac3e2 890c24          mov     dword ptr [rsp],ecx
00007ff6`bfdac3e5 8b44240c        mov     eax,dword ptr [rsp+0Ch]
00007ff6`bfdac3e9 4883c410        add     rsp,10h
00007ff6`bfdac3ed c3              ret

The compiled with Clang function (comptest!fourthArg<int>) saves rsp in its prologue and just restores it in the epilogue even though the function is a leaf one.

I'm not sure but this can be a problem in the MSVC implementation: it touches stack but doesn't restore the stack state (doesn't deallocate the used part of the stack) as it's described in the official documentation: Prolog and Epilog.

There is a ambiguity: some MSDN pages says "Called function pops the arguments from the stack." while others "The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space to store four register parameters, even if the callee doesn鈥檛 take that many parameters.". If we have a look at the code for a wrapper function bellow, we will ought to get the second sentence (the callers must allocate sufficient space on the stack). The OMR generated code:

000001ec`60670034 4153            push    r11
000001ec`60670036 49bbd0c3dabff67f0000 mov r11,offset comptest!fourthArg<int> (00007ff6`bfdac3d0)
000001ec`60670040 41ffd3          call    r11
000001ec`60670043 415b            pop     r11
000001ec`60670045 c3              ret

doesn't allocate any space on the stack but puts another return pointer there when do call. So, all relative addresses is shifted on one qword (64 bytes, the size of the pointer on 64-bit system) and when comptest!fourthArg<int> (the callee) saves its parameters on the stack, it corruptes the return pointer.

Let's have a look at what MSVC and Clang generate for a wrapper function (so, let's call comptest!fourthArg<int> from a wrapper compiled with MSVC or Clang, not with OMR JIT).

MSVC

comptest!ILT+228240(?myWrapperYAHHHHHZ):
00007ff7`65c38b95 e966622200      jmp     comptest!myWrapper (00007ff7`65e5ee00)

comptest!myWrapper:
00007ff7`65e5ee00 44894c2420      mov     dword ptr [rsp+20h],r9d ss:00000027`675ef278=675ef67c
00007ff7`65e5ee05 4489442418      mov     dword ptr [rsp+18h],r8d
00007ff7`65e5ee0a 89542410        mov     dword ptr [rsp+10h],edx
00007ff7`65e5ee0e 894c2408        mov     dword ptr [rsp+8],ecx
00007ff7`65e5ee12 57              push    rdi
00007ff7`65e5ee13 4883ec20        sub     rsp,20h  ; we are preparing the stack for callee?
00007ff7`65e5ee17 488bfc          mov     rdi,rsp
00007ff7`65e5ee1a b908000000      mov     ecx,8
00007ff7`65e5ee1f b8cccccccc      mov     eax,0CCCCCCCCh
00007ff7`65e5ee24 f3ab            rep stos dword ptr [rdi]
00007ff7`65e5ee26 8b4c2430        mov     ecx,dword ptr [rsp+30h]
00007ff7`65e5ee2a 448b4c2448      mov     r9d,dword ptr [rsp+48h]
00007ff7`65e5ee2f 448b442440      mov     r8d,dword ptr [rsp+40h]
00007ff7`65e5ee34 8b542438        mov     edx,dword ptr [rsp+38h]
00007ff7`65e5ee38 8b4c2430        mov     ecx,dword ptr [rsp+30h]
00007ff7`65e5ee3c e8f6e9daff      call    comptest!ILT+51250(??$fourthArgHYAHHHHHZ) (00007ff7`65c0d837)
00007ff7`65e5ee41 4883c420        add     rsp,20h  ; and restores it?
00007ff7`65e5ee45 5f              pop     rdi
00007ff7`65e5ee46 c3              ret

The caller prepares the stack for the callee and the callee is able not to restore the stack state itself. Extra parameters <---> stack copies are also present.

Clang

comptest!myWrapper:
00007ff7`fa1afe20 4883ec38        sub     rsp,38h
00007ff7`fa1afe24 44894c2434      mov     dword ptr [rsp+34h],r9d
00007ff7`fa1afe29 4489442430      mov     dword ptr [rsp+30h],r8d
00007ff7`fa1afe2e 8954242c        mov     dword ptr [rsp+2Ch],edx
00007ff7`fa1afe32 894c2428        mov     dword ptr [rsp+28h],ecx
00007ff7`fa1afe36 448b4c2434      mov     r9d,dword ptr [rsp+34h]
00007ff7`fa1afe3b 448b442430      mov     r8d,dword ptr [rsp+30h]
00007ff7`fa1afe40 8b54242c        mov     edx,dword ptr [rsp+2Ch]
00007ff7`fa1afe44 8b4c2428        mov     ecx,dword ptr [rsp+28h]
00007ff7`fa1afe48 e8c30f0000      call    comptest!fourthArg<int> (00007ff7`fa1b0e10)
00007ff7`fa1afe4d 90              nop
00007ff7`fa1afe4e 4883c438        add     rsp,38h
00007ff7`fa1afe52 c3              ret

The compiler allocates the stack size before call comptest!fourthArg<int> instruction and then deallocates it back.

x86 64 bug compiler windows

Most helpful comment

@0dvictor was actually looking at a problem with the 64bit windows calling convention for the C helpers and this may be related - in that case we weren't buying the 4 stack slots required under the linkage. I'll let @0dvictor check the details of this in that context.

All 8 comments

FYI @0dvictor @andrewcraik @0xdaryl

@0dvictor was actually looking at a problem with the 64bit windows calling convention for the C helpers and this may be related - in that case we weren't buying the 4 stack slots required under the linkage. I'll let @0dvictor check the details of this in that context.

Yes, I'm working on the same issue on OpenJ9's JIT helper calls. Let me fix this bug as well.

It turned not only a calling convention violation on Windows. Stack space for arguments are not allocated on neither Linux nor Windows. I guess none of existing tests have parameters being passed via stack, but on Windows, the MSVC compiler is able to take advantage of the register parameter stack area for optimizations. A fix is coming.

@0dvictor The problem is reproduced only when OMR is built in the Debug mode, so MSVC as well as LLVM pushes parameters from registers to the stack and corrupts the return address (LLVM allocates some stack size in the callee as well and we see no problem). I've tried to reproduce the problem in a production build and added a test to the LinkageTest testcase (the PR is comming).

Also, on Windows x64, mixed type arguments don't work, so if a JITed method calls a functions like double (*)(double, int, double, double) or int (*)(int,double,int,double,int), SEH exception 0xc0000005 is thrown. Maybe it has a place because on Windows we can't use more than 4 registers (either int or float) for parameter passing while on Linux we have 6 registers for integers and 8 for floats, but there I pass 4 parameters only...

@samolisov Thank you for bring it up. I noticed these limitations as well. I'll make sure they are all fixed.

@0dvictor Thank you for your comment. I've open #3389 that demonstrates what happen when there are more than four parameters and if the callee is an actively stack user.

I'm not sure whether OMR must support platform-specific calling convention between methods JITed by itself. So on Windows we have only four registers for parameters regardless of their types (so, even if we have 4 integers and 4 floats in the signature, only 4 in total can be placed in registers) and it is a big limitation. System V AMD64 ABI lets us use 14 registers in total (6 integer and 8 float). What if OMR will use the System V AMD64 calling convention even on Windows and use Microsoft's one only for entry points and methods which may call one or more native methods (like in JNI)?

@samolisov It's an interesting question. I'm not sure if OMR should go this way, but I know some downstream project (OpenJ9) uses customized calling convention for its JITed-to-JITed method calls.

IMO, using a different calling convention is kind of an inter-procedure optimization. Theoretically we are able to use a customized calling convention if and only if following two hold:

  1. Caller knows, at compilation time, Callee is a JITed method;
  2. Callee knows, at compilation time, all its callers are JITed method.

In all other cases where Caller or Callee may be an arbitrary method (let's say an C function), OMR must use the platform's standard calling convention.

Was this page helpful?
0 / 5 - 0 ratings