Mesa 19.3.0~rc2~kisak1~e
Huge performance regression compared to Proton 4.11-7
Loading the game takes about an extra 60 seconds
Loading into a map/game takes a long time
Start of a match has major lag (the match will start before the game is playable for me)
Random short periods of major lag (Randomly get ~30 seconds of the game being a slideshow during a match)
About 100fps worse performance overall
Exiting to the main menu takes over a minute to load (normally instant)
Use Proton 4.11-8
I'm facing with same issue. I hope they fix this asap
I have the same issue too.
@Joshua-Ashton I tested this out and confirmed as well as I could that it is a d9vk regression (popped d9vk from 4.11-7 into the 4.11-8 dist/ folder to make sure). I also tried without d9vk forced and did not see any performance difference between 4.11-7 and 4.11-8 -- @erenkan @Bettehem feel free to correct me if you are not using d9vk
I found the most obvious differences were 1) loading time for the game on launch and 2) getting back to the main menu from a practice game.
FWIW, I'm using a AMD Radeon RX 470 with the standard Ubuntu 19.04 drivers.
I can't repro the issue on Windows with d9vk, I'll test on Proton when back.
@alasky17 I'm using d9vk. As you say; i didnt see any noticeable performance change between 4.11-7 and 4.11-8. But;
1)it's freezing at the first 5 seconds of the match and freezing randomly while in match
Yeah I'm using D9VK too
Hey! I managed to repro this guys!
It doesn't happen if I build D9VK myself and install it, it only happens on the Proton build.
Not sure what happened here.
The build system had a bunch of changes in 4.11-8. Probably the most relevant one here was switching to use our own built mingw-w64 instead of one from an Arch Linux chroot. However, I copied the build process from Arch, and added the option to disable sjlj exceptions, so I'm not sure what I could have screwed up here...
Feels about as slow as when I use a debug build of D9VK actually.
Do you have some specifics on the compiler setup? (what version, is it in this repo?, etc)
Surely I didn't screw that up _again_. You can see the configure step here: https://github.com/ValveSoftware/Proton/blob/proton_4.11/build/makefile_base.mak#L1209 It has --buildtype=release
.
The compiler is built here: https://github.com/ValveSoftware/Proton/blob/proton_4.11/build-mingw-w64.sh
Cheers, I'll have a play around and see if I can repro on a fresh Proton build, and if so, I'll have a bash at fixing it :smile:
@aeikum
Ok, so I've compiled D9VK using the same build script (package_release) that I used inside the Vagrant VM (vagrant ssh, clone + build, copy it out to shared folder and install) and it has the performance issue.
So the issue lies elsewhere... Where? I have no clue.
Actually, lemme try something...
Ok I waited 2h for this bugger to compile and not change anything.
Tried messing about with the mingw-w64 compile args to no avail (mirrored what mine is built with)
Blegh.
Ok, I genuinely have no clue what the issue is:
Same MinGW 64 compile args (compiling the compiler)
Entirely same git repo
Entirely same setup/d9vk build script
But when its built in that Vagrant VM its just 10x slower...
asdfghj
Ok there were two things I didn't change [by mistake] looking at the -v
output...
It's 5:34am and Vagrant has decided to freeze for the 2nd time when building MinGW because its high quality software.
I don't feel like waiting another 2h for a full rebuild which always works, or for it to randomly freeze when just building MinGW on its own in vagrant ssh
-- so if you want to give it a try, be my guest:
diff --git a/build-mingw-w64.sh b/build-mingw-w64.sh
index 9025058..7470693 100755
--- a/build-mingw-w64.sh
+++ b/build-mingw-w64.sh
@@ -91,10 +91,11 @@ function build_arch {
#except "--disable-dw2-exceptions" swapped for "--disable-sjlj-exceptions --with-dwarf2"
#for performance reasons on 32-bit
LDFLAGS=-static PATH=$NEWPATH:$PATH ../../$GCC_SRCDIR/configure --prefix=$DST_DIR/ \
- --build=$BUILD_ARCH --host=$HOST_ARCH --target=$WIN32_TARGET_ARCH --with-gnu-ld --with-gnu-as \
+ --build=$BUILD_ARCH --host=$HOST_ARCH --target=$WIN32_TARGET_ARCH --enable-shared --enable-static \
--enable-languages=c,c++ --disable-multilib --enable-threads=posix --enable-fully-dynamic-string \
--enable-libstdcxx-time=yes --enable-libstdcxx-filesystem-ts=yes --enable-cloog-backend=isl \
- --enable-lto --disable-sjlj-exceptions --with-dwarf2 \
+ --enable-lto --disable-sjlj-exceptions --with-dwarf2 --enable-libgomp \
+ --enable-checking=release \
$GCC_EXTRA_CONFIGURE
fi
PATH=$NEWPATH make $JOBS all-gcc
Are you using VirtualBox backend? It's known to cause a system hang under heavy loads (reproducible even without Vagrant involved). I use the libvirt backend with qemu. It's more difficult to manage VMs than VirtualBox, but at least it doesn't hang...
Will try your compiler change shortly.
Didn't seem to make a difference. I'll keep trying on my end, suggestions and ideas are welcome. I think only 32-bit games are affected, which may be a hint.
I'm using libvirt.
So far I've tried building mingw-w64 9.1.0 (matching the Arch chroot), both with the old flags and with your suggested changes, and have seen no results. I also tried --buildtype=debugoptimized
, just in case -O3 was causing problems, but still no change.
I'm going to try installing the Arch chroot to confirm whether it's the compiler's fault.
I'm using 9.2.0 for git.froggi.es CI and tagged releases on my end. (Just whatever is in the Chaotic AUR repo)
I have my build VM set up to be able to quickly make both chrooted d9vk builds (fast) and local mingw d9vk builds (slow). Here's a tarball of unstripped build of each. I'm going to start comparing them now.
Pretty strong evidence that the problem is missing lto
from our --enable-languages=c,c++
switch. The gcc docs for --enable-languages
state that lto
is enabled if --enable-lto
is set (which it is), so that's weird. But putting it in there appears to fix it.
Pretty strong evidence that the problem is missing
lto
from our--enable-languages=c,c++
switch. The gcc docs for--enable-languages
state thatlto
is enabled if--enable-lto
is set (which it is), so that's weird. But putting it in there appears to fix it.
Turns out this was wrong, I can't reproduce the fix with a clean VM. My testing must have been invalid. So I've just reverted the local mingw build for the next release, so we can at least get this fixed for users ASAP.
My current theory is that this is something to do with the C++ stdlib being somehow unoptimized. My reasoning is that only D9VK and DXVK seem to be affected, even though we use this compiler to build Wine's PE libraries. So I'm going to look into how the C++ stdlib is being built, and try to compare it between the two compiler builds.
Hm, I doubt that. The majority of C++ stuff we use is in the STL and therefore all header only.
Weird that it affects some games more than others though
Yeah I have no idea what's going on here. A simple test program using unordered_map
looked basically identical with both compilers. Ditto for some simple exception handling like DxvkError
. I did notice that performance seems reasonable after initial loading of Rocket League. Perhaps we can find what during the loading is going slowly?
It's definitely possible that all PE files are affected, and that DXVK/D9VK are just the most performance-critical PE files we have in Proton right now.
If you compile it without strip and break while its hanging and find out where thatd be interesting.
I wanted to do that but then remembered Id need to use winedbg + Proton :frog:
I tried gathering a perf log, but nothing of interest appeared. It just said 40% of the time during loading was spent in game code. d3d9.dll doesn't appear in the perf report at all, so probably recording PE files doesn't work. Similarly, nothing interesting in bt all
from winedbg during loading.
printf debugging to the rescue. I added log lines for most methods in src/d3d9/
. "Fast" log is first, then the "slow" log:
[aeikum@aeikum ~]$ grep FlushImplicit steam-252950.log|wc -l
4286
[aeikum@aeikum ~]$ grep FlushImplicit steam-252950.log.bad|wc -l
1200727
So, something's going on there. Still digging...
Yikes!
Here's the call sequence I'm seeing in the "slow" log (88
is the result of GetCurrentThreadId; the following number is GetTickCount
i.e. milliseconds):
info: 88:22532891:bool dxvk::D3D9DeviceEx::ShouldRecord() a
info: 88:22532891:virtual HRESULT dxvk::D3D9DeviceEx::DrawIndexedPrimitiveUP(D3DPRIMITIVETYPE, UINT, UINT, UINT, const void*, D3DFORMAT, const void*, UINT) a
info: 88:22532891:void dxvk::D3D9DeviceEx::PrepareDraw(D3DPRIMITIVETYPE, bool) a
info: 88:22532891:void dxvk::D3D9DeviceEx::UpdateFog() a
info: 88:22532891:void dxvk::D3D9DeviceEx::BindFramebuffer() a
info: 88:22532891:void dxvk::D3D9DeviceEx::BindViewportAndScissor() a
info: 88:22532891:bool dxvk::D3D9DeviceEx::UseProgrammableVS() a
info: 88:22532892:void dxvk::D3D9DeviceEx::UploadConstants() [with dxvk::DxsoProgramTypes::DxsoProgramType ShaderStage = dxvk::DxsoProgramTypes::VertexShader] a
info: 88:22532892:void dxvk::D3D9DeviceEx::BindInputLayout() a
info: 88:22532892:bool dxvk::D3D9DeviceEx::UseProgrammableVS() a
info: 88:22532892:bool dxvk::D3D9DeviceEx::UseProgrammablePS() a
info: 88:22532892:void dxvk::D3D9DeviceEx::UploadConstants() [with dxvk::DxsoProgramTypes::DxsoProgramType ShaderStage = dxvk::DxsoProgramTypes::PixelShader] a
info: 88:22532892:dxvk::D3D9DrawInfo dxvk::D3D9DeviceEx::GenerateDrawInfo(D3DPRIMITIVETYPE, UINT, UINT) a
info: 88:22532892:dxvk::D3D9UPBufferSlice dxvk::D3D9DeviceEx::AllocUpBuffer(VkDeviceSize) a
info: 88:22532892:uint32_t dxvk::D3D9DeviceEx::GetInstanceCount() const a
info: 88:22532892:virtual HRESULT dxvk::D3D9DeviceEx::StretchRect(IDirect3DSurface9*, const RECT*, IDirect3DSurface9*, const RECT*, D3DTEXTUREFILTERTYPE) a
info: 88:22532892:VkImageSubresource dxvk::D3D9CommonTexture::GetSubresourceFromIndex(VkImageAspectFlags, UINT) const a
info: 88:22532892:VkImageSubresource dxvk::D3D9CommonTexture::GetSubresourceFromIndex(VkImageAspectFlags, UINT) const a
info: 88:22532892:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) a
info: 88:22532892:HRESULT dxvk::D3D9Query::GetQueryData(void*, DWORD) a
info: 88:22532892:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) b
info: 88:22532892:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) c
info: 88:22532892:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL) a
info: 88:22532892:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) d --> 1
Those last lines loop for a very long time. Eventually the game dumps:
info: 88:22533393:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) a
info: 88:22533393:HRESULT dxvk::D3D9Query::GetQueryData(void*, DWORD) a
info: 88:22533393:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) b
info: 88:22533393:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) c
info: 88:22533393:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL) a
info: 88:22533393:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) d --> 1
22533.392:0030:0031:warn:debugstr:OutputDebugStringA "[0050.85] Log: Timed out while waiting for GPU to catch up. (500 ms)\r\n"
Then still later...
info: 88:22533530:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) a
info: 88:22533530:HRESULT dxvk::D3D9Query::GetQueryData(void*, DWORD) a
info: 88:22533530:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) b
info: 88:22533530:virtual HRESULT dxvk::D3D9Query::GetData(void*, DWORD, DWORD) d --> 0
GetData finally returns success and the game continues with other rendering calls.
Still digging...
I think the loop is ultimately because vkGetQueryPoolResults
keeps returning VK_NOT_READY
(just reading the code, have't confirmed this). It looks like what breaks this loop is D3D9DeviceEx::FlushImplicit
eventually calls D3D9DeviceEx::Flush
after enough time elapses. I think that forces the CS thread to wake up and process events. Perhaps an earlier wakeup was missed for some reason?
Interesting its hanging on queries. I fixed a bug wrt. that in 0.31-rc-p, is the behaviour present there?
Either way, the fact that this hang is only present in the Proton builds could indicate a deeper mingw issue that could affect other things...
I've been testing with 55cb30c47f5. I'll switch to 0.31-rc-p, just so it's in a known state.
Well, well, well.
Good log:
info: 48:7372009:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194316764800 - m_lastFlush: 1574439194316328600 (436200) >= delay: 750
info: 48:7372009:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194317138200 - m_lastFlush: 1574439194316328600 (809600) >= delay: 750
info: 48:7372009:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)flushing
info: 48:7372010:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194317774400 - m_lastFlush: 1574439194317314300 (460100) >= delay: 750
info: 48:7372010:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194318075900 - m_lastFlush: 1574439194317314300 (761600) >= delay: 750
info: 48:7372010:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)flushing
info: 48:7372011:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194318797200 - m_lastFlush: 1574439194318244300 (552900) >= delay: 750
info: 48:7372011:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574439194319200900 - m_lastFlush: 1574439194318244300 (956600) >= delay: 750
Bad log:
info: 89:7158705:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
info: 89:7158705:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
info: 89:7158706:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
....
info: 89:7159691:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
info: 89:7159692:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
info: 89:7159692:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438981000000000 - m_lastFlush: 1574438981000000000 (0) >= delay: 750
info: 89:7159692:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)now: 1574438982000000000 - m_lastFlush: 1574438981000000000 (1000000000) >= delay: 750
info: 89:7159692:void dxvk::D3D9DeviceEx::FlushImplicit(BOOL)flushing
Now we're getting somewhere...
The high resolution clock is implemented in libstdc++v3 with either gettimeofday
or std::time
:
system_clock::now() noexcept
{
#ifdef _GLIBCXX_USE_GETTIMEOFDAY
timeval tv;
// EINVAL, EFAULT
gettimeofday(&tv, 0);
return time_point(duration(chrono::seconds(tv.tv_sec)
+ chrono::microseconds(tv.tv_usec)));
#else
std::time_t __sec = std::time(0);
// This is the conversion done by system_clock::from_time_t(__sec)
typedef chrono::time_point<system_clock, seconds> __from;
return time_point_cast<system_clock::duration>
(__from(chrono::seconds(__sec)));
#endif
}
And sure enough, we're using different codepaths.
Fast d3d9.dll disassembly:
625a37d0 <std::chrono::_V2::system_clock::now()>:
625a37d0: 53 push %ebx
625a37d1: 83 ec 28 sub $0x28,%esp
625a37d4: 8d 44 24 18 lea 0x18(%esp),%eax
625a37d8: c7 04 24 00 00 00 00 movl $0x0,(%esp)
625a37df: 89 44 24 04 mov %eax,0x4(%esp)
625a37e3: e8 38 5e f7 ff call 62519620 <clock_gettime>
625a37e8: 8b 4c 24 1c mov 0x1c(%esp),%ecx
625a37ec: b8 00 ca 9a 3b mov $0x3b9aca00,%eax
625a37f1: f7 6c 24 18 imull 0x18(%esp)
625a37f5: 89 cb mov %ecx,%ebx
625a37f7: c1 fb 1f sar $0x1f,%ebx
625a37fa: 01 c8 add %ecx,%eax
625a37fc: 11 da adc %ebx,%edx
625a37fe: 83 c4 28 add $0x28,%esp
625a3801: 5b pop %ebx
625a3802: c3 ret
Slow d3d9.dll disassembly:
625ae550 <std::chrono::_V2::system_clock::now()>:
625ae550: 83 ec 1c sub $0x1c,%esp
625ae553: c7 04 24 00 00 00 00 movl $0x0,(%esp)
625ae55a: e8 e9 c4 f6 ff call 6251aa48 <_time32>
625ae55f: ba 00 ca 9a 3b mov $0x3b9aca00,%edx
625ae564: 83 c4 1c add $0x1c,%esp
625ae567: f7 ea imul %edx
625ae569: c3 ret
Looks like this is all happening because the mingw CRT isn't available at the time we build libstdc++v3. I'll look into adapting the Arch Linux compiler bootstrapping process into our build script.
Would it make more sense for us to use QueryPerformanceCounter and co. here then @aeikum?
That's what MSVC compiles it to and what's generally the status quo for this on Windows, not sure how you guys implement it in Wine though?
I mean, this is clearly a Proton build regression, so IMO you don't need to make any changes. If you wanted to skip the C++ and CRT stuff, the "fast" path that we ought to be using boils down to mingw-w64-libraries/winpthreads/src/clock.c:clock_gettime
, the CLOCK_THREAD_CPUTIME_ID
case.
After much heartburn, I've found that it's basically enough to rebuild libstdc++v3 after already installing it. It fails to find gettimeofday
due to libstdc++ being unavailable at configure time (baffling behavior given we are building libstdc++, plus it's a C function...). So installing the "bad" libstdc++ and then rebuilding and installing the "good" libstdc++ gives us a fast d9vk.
Worked around this on our end anyway, the MinGW impl for high_resolution_clock is a bit crap and slow.
This should be fixed in 4.11-9. Thanks for reporting!
4.11-9 is working well, thanks!
This issue may become more relevant now:
https://www.rocketleague.com/news/ending-support-for-mac-and-linux/
An upcoming final release for macOS and Linux will disable all online functionality. Guess that's the price you pay for getting bought out by the Epic Games Store.
For the last couple of days the game has been crashing while joining matches, but only sometimes and while running through Proton. Has somebody else experienced the same issue?
Unhandled Exception:
System.ComponentModel.Win32Exception (0x80004005): ApplicationName='Win32\RocketLeague.exe', CommandLine='', CurrentDirectory='', Native error= Nie znaleziono pliku.
at System.Diagnostics.Process.StartWithCreateProcess (System.Diagnostics.ProcessStartInfo startInfo) [0x0029f] in <607f1b068c2449bd8b08637972e8c1ec>:0
at System.Diagnostics.Process.Start () [0x0003a] in <607f1b068c2449bd8b08637972e8c1ec>:0
at (wrapper remoting-invoke-with-check) System.Diagnostics.Process.Start()
at System.Diagnostics.Process.Start (System.Diagnostics.ProcessStartInfo startInfo) [0x0001b] in <607f1b068c2449bd8b08637972e8c1ec>:0
at Launcher.Program.Launch (System.String Bin, System.Collections.Generic.IEnumerable`1[T] Params) [0x0001f] in <b96cd727d1254311be88f967467cf48f>:0
at Launcher.Program.Main (System.String[] args) [0x0012e] in <b96cd727d1254311be88f967467cf48f>:0
23904.875:0026:0027:trace:mscoree:CorExitProcess (1)
23904.875:0026:0027:trace:mscoree:CLRMetaHost_ExitProcess 1
23904.881:0026:0027:trace:mscoree:CorExitProcess (1)
23904.881:0026:0027:trace:mscoree:CLRMetaHost_ExitProcess 1
23904.881:0026:0027:trace:mscoree:_CorDllMain (0x1ea0000, 0, 0x1)
23904.881:0026:0027:trace:mscoree:_CorDllMain (0x10000000, 0, 0x1)
23904.881:0026:0027:trace:mscoree:_CorDllMain (0x1870000, 0, 0x1)
[ERROR] FATAL UNHANDLED EXCEPTION: System.ComponentModel.Win32Exception (0x80004005): ApplicationName='Win32\RocketLeague.exe', CommandLine='', CurrentDirectory='', Native error= Nie znaleziono pliku.
at System.Diagnostics.Process.StartWithCreateProcess (System.Diagnostics.ProcessStartInfo startInfo) [0x0029f] in <607f1b068c2449bd8b08637972e8c1ec>:0
at System.Diagnostics.Process.Start () [0x0003a] in <607f1b068c2449bd8b08637972e8c1ec>:0
at (wrapper remoting-invoke-with-check) System.Diagnostics.Process.Start()
at System.Diagnostics.Process.Start (System.Diagnostics.ProcessStartInfo startInfo) [0x0001b] in <607f1b068c2449bd8b08637972e8c1ec>:0
at Launcher.Program.Launch (System.String Bin, System.Collections.Generic.IEnumerable`1[T] Params) [0x0001f] in <b96cd727d1254311be88f967467cf48f>:0
at Launcher.Program.Main (System.String[] args) [0x0012e] in <b96cd727d1254311be88f967467cf48f>:0
23904.881:0026:0027:trace:mscoree:DllMain (0x7eb40000, 0, 0x1)
pid 22355 != 22354, skipping destruction (fork without exec?)
It says file not found. The game won't start. Anyone else experiencing this issue?
Most helpful comment
This should be fixed in 4.11-9. Thanks for reporting!