Tested by https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/commit/93cf2ab789aaa422ef8f3a194b45f22e6f990731
Avatar: The Last Airbender
Genma Onimusha
Guilty Gear Isuka
Muzzle Flash
The Incredible Hulk: Ultimate Destruction
https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/blob/master/src/CxbxKrnl/EmuFS.cpp#L379
__declspec(naked) void EmuFS_PushDwordPtrFs00()
{
uint32 returnAddr;
uint32 temp;
__asm
{
call EmuFS_RefreshKPCR
鈻衡柡鈻簆op returnAddr
mov temp, eax
mov eax, fs : [TIB_ArbitraryDataSlot]
push dword ptr [eax]
mov eax, temp
push returnAddr
ret
}
}
It's because of stack variables being used, but the prologue code is omitted in naked functions.
.text:0413A990 ; void __cdecl EmuFS_PushDwordPtrFs00(void)
.text:0413A990
.text:0413A990 temp = dword ptr -8
.text:0413A990 returnAddr = dword ptr -4
.text:0413A990
.text:0413A990 call EmuFS_RefreshKPCR
.text:0413A995 pop dword ptr [ebp-4]
.text:0413A998 mov [ebp-8], eax
.text:0413A99B mov eax, large fs:14h
.text:0413A9A1 push dword ptr [eax]
.text:0413A9A3 mov eax, [ebp-8]
.text:0413A9A6 push dword ptr [ebp-4]
.text:0413A9A9 retn
EmuFS_PopDwordPtrFs00 will also be broken I guess.
Moving the variables down should fix this problem.
Since this function must mimick the effect of the instruction it replaces, it has to fiddle with the stack.
But there's another issue: writing values to memory could lead to a concurrency collision (it's not thread safe). I proposed an alternative implementation a while ago to Luke, but I'd have to look it up....
Alternative implementation, not requireing local variables :
call EmuFS_RefreshKPCR
push 0 // Reserve a slot at [esp-4] (return address will be copied to here)
push eax // Backup the original EAX value (we don't want to overwrite it!)
mov eax, [esp - 8] // Read current return addr
mov [esp - 4], eax // Write it to the reserved slot
mov eax, fs : [TIB_ArbitraryDataSlot] // Read the KPCR.NtTib pointer into EAX
mov [esp - 8], eax // Overwrite the return address
pop eax // Restore the original EAX value - stack now points to the slot where we moved our return address
ret
Would something like this work (no eax needed)? I'm just writing it off the top of my head....it'd be nice if these used a direct jmp instead of call. It'd eliminate those extra read/writes to esp.
push [esp] ; Ret addr is now on the stack twice
add esp, 8 ; Move to slot where original push fs would write
push fs:[TIB_ArbitraryDataSlot]
add esp, 4 ; Move to duplicated ret addr
retn
@Nukem9 according to Luke, your code crashes outright, care to investigate?
I should've just deleted that - there is no way to avoid using a scratch register here. It requires a pointer dereference (fs:[14h]) and then another dereference which can't be done on the stack.
If I'm reading this correctly:
; Original code https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/issues/986#issue-304619151
;
mov eax, fs : [TIB_ArbitraryDataSlot] ; 1st dereference
push dword ptr [eax] ; 2nd dereference
; Your code (broken?) https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/issues/986#issuecomment-373204458
;
mov eax, fs : [TIB_ArbitraryDataSlot] ; 1st deref
mov [esp - 8], eax ; Missing a deref
; My code (broken) https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/issues/986#issuecomment-373217243
;
push fs:[TIB_ArbitraryDataSlot] ; 1st deref
add esp, 4 ; Missing a deref
It's fixed by (This solution is NOT thread safe, but is better than outright
crashing.) https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/pull/992/commits/985bb16515b7e2301f8d0d939bb4e74826c0e8da close or wait for better solution?
Good to hear the crash is resolved, thanks! However, the current solution isn't thread safe, so we'd better keep this issue open to devise a better solution for fixing both push and pop.
Updated the issue title to reflect current status
I have a potential fix for this which involves a lightweight assembly spinlock implementation, carefully crafted so it doesn't trash any registers. Expect a PR in the coming days.
Damn, forgot about this, I'll get it sorted
Fixed in #1345 and #1355.
Although Luke's solution makes the FS code thread safe, we should still consider the reason it wasn't thread safe to begin with : the use of global variables. Perhaps there's a way to implement these functions without using global variables (but stack space, like the example mentioned above) which would remove the need for (and overhead of) thread safety entirely!