Pcsx2: DX12 and Vulkan backends?

Created on 20 Dec 2015  Â·  162Comments  Â·  Source: PCSX2/pcsx2

I wonder if there is a chance for DX12 backend in PCSX2 it gives huge performance increase.

https://github.com/dolphin-emu/dolphin/pull/3364
https://forums.dolphin-emu.org/Thread-unofficial-dolphin-dx12-backend?page=5

Enhancement / Feature Request GS Postponed Question / Discussion

Most helpful comment

Vulkan would be better for obvious reasons. But still it's not like PCSX2 needs fast GPU that much.

All 162 comments

Maybe gabest will show up again and provide a dx12 backend (as he does from time to time). I guess no other active dev would consider this.

Also this is a typical question for a forum and was answered there several times to all extends.

Vulkan would be better for obvious reasons. But still it's not like PCSX2 needs fast GPU that much.

Okay I ping gabest @gabest11

idk what i am talking about, but i will ask you.
EE is really important but his job it can be done by a gpu no? if you use another instructions it can work with the gpu or not?
how it works?

Maybe once I move to Windows 10.

Wasn't this already discussed tons of times on the forums?
And with Vulkan coming _imminently_ (a month? Which I'd say is more or less the time needed for a somewhat usable new renderer) I wouldn't know

You guys heard the man - he will randomly probably add it when he randomly moves to Windows 10 XD

IMO Vulkan is a much better choice because of portability.

I disagree, better to focus on cross-platform stuff

DX12 must exist both with vulkan because some manufacturers have poor outside opengl/ vulkan api in windows version and that's the harsh reality .

It's not like those very _some manufacturers_ aren't the same manufacturer that released Mantle spec the whole Vulkan api is based upon.

tl;dr: there's no correlation between GL and vulkan.
Pcsx2 is definitively going to support the later, someday, considering where most of the graphic "team" is. As for the former who knows. Nobody is going to say no to a nice addition. But first you have to code it. And it's not a priority atm.

So far we've found 2 games that might benefit from newer API.

  • Zone of the Ender
  • Juiced

How'd you find them? Based on # of draw calls?

AMD people would also benefit, as the current opengl backend runs very poorly :(

@Sarania more or less. I profile them too. If I have a high usages of function 0x2351521, I infer the driver is busy. (note that I need to double check ZoE, I debug/profile some much dump). However those games are very heavy on texture/state change. So you don't have any guarantee that new API will really improve the situation (as you still need to flush your GPU).

@nicman23
No. What AMD users need is a multi-thread API dispatcher. Nvidia provides it in the GL driver, I guess DX provides it for all vendors. Remains the others... In all cases, it won't be provided by newer API, so GSdx would need to provide this code too (like thousand of projects... AKA useless code duplication)

@gregory38 Vulkan and DX12 reduce the strain on an AMD CPU. If they chose to implement Async Compute in the plugin you'd see quite an increase. Newer APIs would absolutely provide a significant performance boost to AMD GPU setups.

I'd say Vulkan, as there's no real point implementing both DX12 and Vulkan when they do the same thing, but the advantage of Vulkan would be that it would work cross-platform (whereas DX12 would be limited to Windows 10 exclusively).

As far as I know, Dolphin itself isn't using DX12 Feature level, the rewrite of the graphic backend itself discovered some severe bugs in dolphin, which so got solved. DX12 hasn't been totally tested out.

Iirc a dev team just randomly, posted the patches to dolphin.

The initial effort wasn't by the dolphin devs

As I said.

So far we've found 2 games that might benefit from newer API.

The Ratchet and Clank games would likely see a benefit as well as those games really stress the CPU, unless you heavily underclock the EE which can cause the in-game FPS to dip at times.

Any reduction in CPU overhead would help, honestly.

But are you sure that the issue is the overhead of the driver. And not the GSdx overhead. It is really a complex topic.

Any reduction in CPU overhead would help, honestly.

Sure but it doesn't come for free. I'm pretty sure people prefer a slow openGL rendering with nice mipmap texture and shadows rather than a fast Vulkan mipmap garbage.

I'm not saying that the issue is the driver, just that the reduction in
driver overhead would give the emulator more room to breathe.

Well of course people would prefer a more full features yet slower renderer
compared to one that is fast but lacking vital features. But it's a moot
point as there is no Vulkan backend currently.
I
On Sep 27, 2016 4:14 AM, "Gregory Hainaut" [email protected] wrote:

But are you sure that the issue is the overhead of the driver. And not the
GSdx overhead. It is really a complex topic.

Any reduction in CPU overhead would help, honestly.

Sure but it doesn't come for free. I'm pretty sure people prefer a slow
openGL rendering with nice mipmap texture and shadows rather than a fast
Vulkan mipmap garbage.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/PCSX2/pcsx2/issues/1047#issuecomment-249810452, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AEJES-qs3rVozeYM83yoUSNxvR6-9uh1ks5quN5bgaJpZM4G41S4
.

I'm not saying that the issue is the driver, just that the reduction in driver overhead would give the emulator more room to breathe.

No. The driver is running is own thread. If you're not limited by the driver overhead, it could be 10x time faster that you will barely notice any speed impact on the emulation. Well you're right that it will reduce the load on 2 core CPU but it is maybe time (and faster) to upgrade to a 4 core CPU (with AMD comes back, I hope it will become more standard).

But it's a moot point as there is no Vulkan backend currently.

There is no Vulkan backend but emulation of Ratchet and Clank games is nearly perfect. So it isn't moot, I already made the choice to improve the quality over a potential speedup.

@Two-Tone:

I've looked through your github contribution history and it seems you're more of a tester/issue commiter than an actual coder. I'm of a somewhat similar ilk. Just be sure you know that gregory38 is an actual coder and intimately involved deep in the nuts and bolts of maintaining a PCSX2 codebase that has been around for several years and has over 95% compatibility with PS2 games.

That said, perhaps you should realize that neither Vulkan nor DX12 are 'magic bullets' when it comes to improving performance. The primary benefit comes from those who have the HARDWARE to handle things and that hardware isn't going to happen from people who (time and again) complain their Intel Integrated HD-shit graphics have bugs or won't run X game despite bold stickied caps threads on the PCSX2 forums.

The overwhelming majority of users are not on Win10 nor have any inclination of 'updating' to Win10. I've made this comparison elsewhere but Win7 is your rock solid STABLE OS while Win8.1 is a BETA OS and Win10 is your DEV/ALPHA OS that isn't ready for prime time and won't be ready for at least another year or two. Microsoft fired most of their testing team and the buggy broken garbage 'updates' they've released to break people's systems time and again have unfortunately caused a backlash where NOBODY is updating their computers even for crucial security updates. GG, Micro$oft.

Win10 is not the answer and DX12 is not the answer. It would be utterly insane to bother with DX12 support for less than 20% of the market and have that 'support' change on a near-daily basis with every random forced untested Windows Update for Win10 causing nothing but headaches for both users and developers of programs like PCSX2.

And all that said, Vulkan is by far the most promising of the APIs but from personal experience and poking around it is NOT ready for prime time. Maybe 85% ready but still not quite there yet. I know of Doom being one of the few high-profile games that have Vulkan support and that is because John Carmack is a programming genius and managed to get it done within a reasonable amount of time. Not every project has a John Carmack on staff (if only we were so lucky) so we work with what we have.

DX12 is a lost cause and Win10-only while Vulkan is still at least a year away from being a mature-enough API for regular production use. Neither of those APIs are going to provide a significant performance increase in enough PS2 games to justify the (unpaid volunteer) coders here to set aside significant time & resources to work on it.

TLDR: Please do your research before spewing your ignorant vomit all over this project, thanks.

Here's a video that even a simpleton like you can understand; you should do an internet search for more detail if you like:
https://www.youtube.com/watch?v=r0fgEVEgK_k

What? I never mentioned Win10 or DX12, only Vulkan, so you're putting an awful lot of words in my mouth.

It wouldn't make sense for me to advocate Win10 or DX12 as Debian and Win7 are my two primary OSes.

it seems you're more of a tester/issue commiter than an actual coder.

I generally don't contribute code to floss programs, but I have my own private repos on here. Don't assume you know a person just because you looked through their minimal public GitHub history.

That said, perhaps you should realize that neither Vulkan nor DX12 are 'magic bullets' when it comes to improving performance

Of course not , but other emulators have shown that using a low level API like Vulkan have allowed better overall performance, _especially_ for lower end systems. If the Steam HW Survey shows anything, it's that the overwhelming majority of gamers have low end machines.

Like Gregory said, there is a definite cost, and that cost is man hours. You can't just press a button or tell gcc to compile your code a certain way and suddenly have a high performing Vulkan or DX12 backend. But the potential benefits can be fairly large, especially for the average user.

BTW Gregory, I've got an i5 3470. That's a quad core. I'm still CPU bound by the Ratchet and Clank games, so any driver overhead reduction would benefit me some and those with lesser hardware greatly.

Doom being one of the few high-profile games that have Vulkan support and that is because John Carmack is a programming genius and managed to get it done within a reasonable amount of time

Carmack left id back in 2013, long before Vulkan was ever even announced (2015) or released (this year). Maybe you're the one not doing their research?

TL;DR Please don't be so toxic.

Also, there seems to be some sort of misunderstanding; I never said that this HAS to be done, just saying that the R&C games would benefit from a backend that has lower CPU overhead.

Of course not , but other emulators have shown that using a low level API like Vulkan have allowed better overall performance, especially for lower end systems. If the Steam HW Survey shows anything, it's that the overwhelming majority of gamers have low end machines.

Except that most lower end systems don't support those API's...

Except that most lower end systems don't support those API's...

If you're talking about age, then for Nvidia their GPU support goes back as far as their 600 series (2012), AMD is their entire GCN family (early 2012, support is sketchy on Linux, but that is improving, plus the vast majority of users are on Windows), and for Intel no support yet on Windows but support on Windows goes as far back as Ivy Bridge (2012).

Intel is the odd one out, but the as the Steam Hardware Survey consistently shows every month, approx 82% of all their users use AMD and Nvidia hardware. About 10% of that (or 75% total) doesn't (there is close to 13% of the total that I wasn't able to account for because they only show up as "Other". And yes, I went through and added up the percentage of shown AMD and Nvidia users with GPUs that are unable to use Vulkan).

In other words, most users' systems do support Vulkan.

Most users' systems do support Vulkan, but most lower end systems don't or only support it through their Intel iGPU on Linux(which does not really count, as most users don't want to switch).

At least 35% of the surveyed AMD and Nvidia users are using a mid-low to low end GPU that support Vulkan. That's close to half of all AMD and Nvidia GPUs that support Vulkan (46.6~%). It's also greater than the total number of Intel or AMD users (if combined it's only behind by 7%).

The total number of users who would benefit from a Vulkan backend is fairly substantial.

I did not go through and count the ones that don't support it as the number of low end that do was greater than the total that don't.

The total number of users who would benefit from a Vulkan backend is fairly substantial.

Agreed. Now let's move on from this point.

I don't think anyone is opposing supporting new backends. The issue remains that it needs someone to actually add the support for new backends, which takes considerable knowledge, time and effort. Until such someone volunteers for the task, well.. not much would happen.

Vulkan is still new, every new GPU will support it, even low end range, it's just about time, just like every other tech

At least 35% of the surveyed AMD and Nvidia users are using a mid-low to low end GPU that support Vulkan. That's close to half of all AMD and Nvidia GPUs that support Vulkan (46.6~%). It's also greater than the total number of Intel or AMD users (if combined it's only behind by 7%).

Those mid-low to low end GPU's are either laptop GPU's that often have CPU's that can hardly support PCSX2 anyway or they're fall in the high end systems category for PCSX2 and won't have much issue's with PCSX2, except in a handful of games. Either way it won't help the actual low-end systems.

I see a few issues with this thread.

1 - Using Steam Hardware Survey as the be-all and end-all of 'what hardware PC gamers use'. While Steam represents a majority of the market, it still isn't 100% of the market. Many people like myself use GoG or Humble Bundles to obtain games and don't use or participate in Steam's ecosystem. Ignoring that fact is to ignore reality.

2 - The case is made for Vulkan supporting low-end GPUs and that point is under contention. In some cases it may help but in other cases it may not help at all. Laptop hardware in particular is difficult to account for when it comes to features supported. There have been many cases where a GPU says 'OpenGL compliant' and then it didn't support the actual 100% compliance spec but only a random handful of things instead.

3 - Vulkan is not a silver bullet and may or may not provide a substantial improvement for those with the capability to utilize it. The Nvidia 600+ series is listed but Doom Vulkan support explicitly lists issues with the Nvidia 690 and says it is NOT fully Vulkan-compliant.

*Vulkan is not currently supported on NVIDIA GPUs with 2 GB of RAM on Windows 7 or on the GTX 690. Users with these GPUs need to run DOOM on the OpenGL graphics API.

https://community.bethesda.net/thread/54585

So far when it comes to benchmarks there haven't been many cases where it would drastically improve things except for a small portion of users in specific circumstances with very specific games.

In most cases, as FlatOutPS2 stated, you either have more than enough hardware to run full speed with OpenGL/DirectX and wouldn't benefit much from Vulkan, or your hardware is so old and so poor/unfit that even with Vulkan you'd either have spotty/buggy support or it would still provide no benefit to your setup or gaming experience.

I'd actually want to vote for the devs to drop DirectX support altogether. Why maintain two APIs when we have OpenGL not only on par with DirectX but BETTER than it with more accuracy AND better performance rolled into one ALONG with cross-platform compatibility? Even on my Windows gaming machine I prefer using the OpenGL API whenever I have the option in emulators to do so. It just works well for me :)

And ultimately in the end, talk can happen but someone with the coding capability has to get it done. I'd much prefer seeing work done to improve what we already have rather than use a beta-quality API that hasn't been matured enough for significant development time to be devoted to it.

I'd actually want to vote for the devs to drop DirectX support altogether. Why maintain two APIs when we have OpenGL not only on par with DirectX but BETTER than it with more accuracy AND better performance rolled into one ALONG with cross-platform compatibility?

Because AMD's OpenGL sucks but the DirectX driver is just fine, meaning a
LARGE number of users would see a huge decrease in performance (I see about
a 30-50% decrease in every game I have tested on my 390).

Because AMD's OpenGL sucks but the DirectX driver is just fine, meaning a LARGE number of users would see a huge decrease in performance (I see about a 30-50% decrease in every game I have tested on my 390).

Unless that was a very small and unlucky collection of games you tried, that's a load of BS. I see a decrease in maybe 25% of the games, and most of those drops are between 10-20%, which can often be mended by changing a few (speedhacks)settings, and that's with an AMD GPU that is quite a bit older than yours...

Oh my, welcome to drama world.

Here some facts

I get a significant boost to emulation speed on the gpu side with mipmapping enabled for hardware. Like gains of 14-20 fps significant. Whatever else mipmapping did it also increased my speed.

Specs that matter are Nvidia geoforce 750m and Intel core i7 4700mq with nvidia binary version 370.28.

Mipmap likely increases the driver overhead (more texture to upload). Yet it is much faster because it reduces the size of the texture to process on the GSdx thread.

  • Again AMD users requires a multi-threaded GSdx implementation. It provides a huge boost on Nvidia. It will provide a huge boost on AMD.
  • Vulkan doesn't come with MT done. So in both case (GL/Vulkan), you need to do the MT part.
  • Vulkan speed increases is big on AMD but small on Nvidia. And AMD can still improve their GL drivers. The various game port said "easy to win on the CPU, hard to keep same perf on GPU".
  • If you want really want to increase speed, we need to use advance GPU capabilities to merge multiple draw calls to a single. Vulkan doesn't provide those features.
  • We can still use GL feature to reduce driver overhead (such as bindless texture) but initial implementation show no real improvement (but I didn't test the heavy game). Disabling error checking, again show no improvement. Improving the texture cache invalidation on GSdx gave us a 2x speed boost on some games. So you need careful profiling in order to find the real slow path
  • Maybe people don't care about the Linux user base. But it will greatly reduce the dev workforce on GSdx. Actually all latest improvements were only done for Linux user. It wasn't my plan to replace Dx by GL.
  • Vulkan API isn't mature. Debugger, validation layer, drivers, spec all are ongoing.

Conclusion, there is no need to rush on Vulkan when the first step ought to be GSdx MT support. Then next step is to redo a new renderer based on latest extension. So we can have a fast accurate date, fast accurate blending, potentially reduce the cost of depth conversion (note fast on CPU but heavier on GPU) ...

Vulkan would be great, but I preffer full 64-bit support in PCSX2, because I'm tired of install many i386 dependency's on my Linux systems. And PCSX2 interface is ignore desktop theme and looks like "hello from Windows 95"

 PCSX2 interface is ignore desktop theme and looks like "hello from Windows 95"

Oh, it could explain why it's looking so much modern on my PC :stuck_out_tongue: On linux, I think the issue come from the link of wxWidget with GTK2.

 I'm tired of install many i386 dependency's

In order to do development, to use mesa driver, mesa testsuite, gpu debugger. Do you know how many Debian package I recompiled in local. And how much my system is a mess to support multiple arch dev packages.

Regarding the interface, make sure your themes are set up properly. I'm running a 64bit system and it looks fine here, so it's not a PCSX2 issue:

image

Oh might need to install a 32 bit theme too ;)

PCSX2 interface is ignore desktop theme and looks like "hello from Windows 95"

Oh, it could explain why it's looking so much modern on my PC :stuck_out_tongue: On linux, I think the issue come from the link of wxWidget with GTK2.

I'm tired of install many i386 dependency's

In order to do development, to use mesa driver, mesa testsuite, gpu debugger. Do you know how many Debian package I recompiled in local. And how much my system is a mess to support multiple arch dev packages.

I just gave a couple examples. 32 bits is a past. Many software now has 64-bits only, many game emulators has 64-bit version, and don't need to install any i386 crap.

@Enverex I'm using KDE.

@FlatOutPS2 My gpu is a R7 260x and it is much slower on OpenGL backend

That's going to be fixed soonâ„¢.
And.. If nobody has actual technical reasons it would be better not post.

EDIT: better not to post here with incidental reasons this may benefit somebody

better not post where? on the AMD forum? It looks like he's not totally convinced that it is a problem with the drivers to me, looks like he's trying to blame you for using old drivers or your rig. I've been trying to sign up to reply this is a common problem where OGL is half the speed on most games compared to DX11, regardless of driver, since neither of you have spoken in there since 4 weeks ago :P

When Vulkan is more mature, we may be able to see more devs use it and implement it into other emulators including this one. For now though, it is a toss-up as to whether it will be worth the time.

since neither of you have spoken in there since 4 weeks ago

They PMed me (probably since I never really received a notification about that post) telling they found an optimization that would have been rolled out soon.

Did you get those low FPS on the AMD test case dump with default settings? Because I got similar results to the AMD employee, and nowhere near as low as indicated in your first post.

Settings are included in the test case. Perhaps it's just you don't have a Core 2 Duo?

I would like to see both Vulkan and DX12 support. And with good reason. Vulkan is great for multiplatform, but I need something that the Intel HD 400 could support, which is DX12. DX12 doesn't really help with graphics as it does with spreading out the cpu load.
The computer i'm using this on.. is a very underpowered intel Atom X5-z8300 LattePanda board which is a small SoC board, think Raspberry Pi sized. It runs x64 windows 10. But vulkan only supports this chip in ubuntu atm. I haven't been able to get Vulkan to work at all.

In the dolphin emulator, using the DX12 backend which is supported in the dev/daily builds, gamecube games are now running a solid 30fps, 60vps, and even some wii games are running at 95% or higher speeds, which is impressive for such a weak SoC board.

So I want DX12 for those of us who don't seem to have a lot of vulkan support. But i want vulkan for all you multi-platformers. Android, linux users etc.
Seriously! a palm sized PC that does Gamecube, wii, PSP, and even PS2? That's pretty awesome to me.

That is impressive. Once Vulkan API matures a bit and provides devs with mature tools (instead of having to hand-code the assembly from scratch) then we can see a lot more widespread support for it.

How it is going?
Vulkan drivers all of them are passing the conformance test so i think all are production ready.
I am really looking forward this feature.
You guys are awesome.
Thanks for your work.

It was never about the stability of the new APIs, it was about effort to code it vs the gain, to which there is very little

On an off-topic note, Mednafen, a PS1 emulator, has a Vulkan backend as of December 2016.

In-depth article here https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/

refractionpcsx2, I'm not so sure about that on the minimal gain. For weaker systems like the Lattepanda, or even mobile devices, Vulkan has a major impact. Dolphin is evident enough. On the panda's limited DX12(11_1) support, gamecube and wii games gained at least 30% framerates by switching to DX12 over the other available options.

Passing conformance test is different of a stable driver. AMD has passed the openGL conformance test for 2-3 years and yet we are still waiting a driver that can render properly without BSOD (or whatever it is called now).

And we still don't have free Vulkan driver and good tool to debug.

There is 2 massive differences with Dolphin. Their core emulation is faster and they get likely more draw call. If you want to achieve +30%, you basically need both VU ans EE threads below 70%. And GSdx limited by the validation/draw call number. If you're limited by EE/VU.

You can still get some bonus based on your computer. On 2 cores if the GS thread is faster you can reallocate the computing to others thread, good. On 4 core you might win a bin on turbo if you're lucky, otherwise one core will idle more. On small board, you can get a massive boost because you will get less throttling.

If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time.

To complete my previous message. Since the 1.4 release, the code get various speed improvement. The rendering correctness is 10 times better.
For example people said we need Dx12 because Ratchet & Clank is slow on good computer. Then I implemented a kinds of mipmapping and now it is much faster. As you can see the speed isn't about hype API versus older API.

So far, with one year behind us, I can tell you that I don't regret that we didn't lose time to implement Vulkan/DX12. IMHO, we have bigger priorities such as a 64 bits port of the not-yet-ported code.

For reference

Is it me, or anvil is a great name for a fast framework created by AMD ;)

So both AMD & Nvidia creates an extra API to add ref-counting the Vulkan structure. It was cheap to not include it on the initial spec.

Is it me, or anvil is a great name for a fast framework created by AMD ;)

Only if NVIDIA's would be called brittle.

"gregory38: If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time."

For me, that's not an option with my intended uses. I'm aiming for single bord x86/x64 computers like the Lattepanda and Upboard. And other mini game systems like the GPD-Win, and the Smach-z. Boards where upgrading individual components just isn't an option. I want to see a full Emulation system the size of an n64 game cartridge.

Putting aside that:

  • just because one wants something it doesn't mean physics or whatever surely will agree
  • "slow" is no real qualifier and for as much you and I know, it could already be perfect
  • in my perfect world AMD has first of all fixed their damn opengl

The only vaguely meaningful optimization I see there is leveraging the fact the smach is HSA-compliant (which means zero copy is possible, which means bla bla bla).
Which is something that for as much I see gregory remotely interested would require at minimum somebody to buy him the required hardware/dev-board.

Before I have a delivery man in my front door, I don't have time ;)

IMHO, HMM/HSA would only be interesting with Native resolution. It will allow to emulate the GS memory as a coherent memory (and plenty of sync issue). It would avoid all texture conversions which are really the killer in CPU/GPU perf (that why sometime the SW renderer is faster). Anyway, the future is first programmable blending.

@mirh AMD + OpenGl ... I get nightmares just thinking about that.

@lightningterror nah, it's actually pretty good providing you use the latest mesa. It's definitely getting there.

For OpenGL I see GL_ARB_bindless_texture was removed , the info about it seems it should provide a speed bump , maybe it could help amds fail drivers.
Was the code really that broken ?

it should provide != it provides
The extra complexity wasn't worth it.

However, GSdx state isn't the same nowadays. We use to have 1/2 textures. We can now have 3-5 textures. Potentially my implementation was bad. Hopefully the extension will soon be implemented in Mesa so we will be able to understand how it is working.

I'm afraid that AMD driver is long road. IMHO, we need

  • a working SSO implementation
  • a performance fix for SSO
  • multi-thread OpenGL

SSO explanation. SSO allow to change the Fragment Shader (FS) without revalidate the Vertex Shader (VS). Feature was introduced in Dx9 (or maybe before)....
In our case, FS is updated at a high frequency (1-5 draw calls). VS is updated at a much lower frequency (and potentially could have been 0 if I didn't need to put ton of hack to support AMD/Intel driver). It means that AMD/Intel driver does a lot of extra validation for nothing.

By the way, the speed issue could be also a limitation of AMD architecture. Gsdx does a lots of draw call with few primitives. Modern GPU are designed to handle big number of primitives in one shot. Maybe the overhead to process a command in the GPU is bigger than the time to process the draw call. Hence the stalling of the application.

From what I read/watched(if I'm correct) how amd and how nvidia do scheduling is quite different. Nvidia does it on the driver whereas with amd you need to specify resources to what core they should go or something like that , so that leaves devs to implement it on their software instead since the amd driver is quite different than from nvidia.

If that's true then maybe multithreading needs to be added specifically for amd gpus in gsdx.

Please don't read random info from fanboy that never wrote a single line of code :) GPUs are really a complex domain.

Nvidia driver can use multiple threads for various operations. Whereas AMD is more single threaded (I'm pretty sure they use some MT but definitively less).
Then you have the hardware scheduling which is unrelated and became the trending hype AKA asynchronous compute whatever... So yes AMD gives dev more possibility to dispatch the rendering command in different resources wth different priority. But there is no compute in GSdx so it is a moot point. Anyway soon Mesa driver will support MT gl, so we will be able to have nice comparison

What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)

Besides, let's not forget than the slowest and oldest (1.8ghz) C2D+nvidia was like 3x times the framerate of my 3.2 one+amd.
They simply have some code that fucks up over itself, it's not just multi-thread.
EDIT: I'm not sure what's the point in this issue, it's not like anybody would have to be reminded about this xD

What we need is a gl thread dispatcher.

Something like this? https://github.com/NVIDIA/libglvnd

Wtf? That dispatches calls between system and driver, not between game and driver. It has nothing to do with rendering and threads.

yes it is unrelated. The goal of glvnd is to switch gl driver at runtime instead at reboot.

What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)

aren't nvidia the only ones that have that in NV_Command_list?

Not sure nv_command was mentioned.
Anyway, even mesa is threaded.

No offense but people should stop to post random word. NV_Command_list records all the states into a single blob state (which can be seen as a list a of command). It is a way to achieve something closer of Vulkan/Dx12 API but with OpenGL.

Here we deals with basic multi thread approach.
Instead to do

do gsdx stuff
exec gl cmd1
wait execution done
do gsdx stuff
exec gl cmd2
wait execution done

We do

do gsdx stuff
Ask your buddy to exec cmd1
do gsdx stuff
Ask your buddy to exec cmd2

And buddy will do

exec gl cmd1
wait execution done
exec gl cmd2
wait execution done

Note: Mesa threading isn't yet compatible with PCSX2. And it won't be ready for the soon to be released version.

Fwiw, I have some patches to improve Mesa threading. It really give me a nice speed boost (on blood will tell: ) even on my haswell 4Ghz. Unfortunately I found some bad stuff in Mesa so it will crash after 5-15 minutes of gameplay...

Patches to Mesa or patches to pcsx2?

If PCSX2 is going to rid itself of DX9 I think it would be better just to Rid it of DX entirely and use Vulkan since its usable in Linux and Windows plus any card that supports DX11 or 12 is sure enough to support vulkan and it would narrow down everything to one backend. Im no dev but this is just my opinion

That's not a viable option right now. AMD users are already forced to use the DX backend due to driver issues that AMD hasn't resolved. In addition to that, the time it would take to implement Vulkan versus what we would get back in performance benefits isn't worth it.

well thats what i was thinking about. Vulkan would get around AMDs dodgy GL drivers and it seems pointless to keep DX around (After) if a Vulkan backend ever gets made. If DX9 is going to be dropped in the future regardless if its far off, and we are left with GL and DX11 why not just slowly phase out DX11 as Vulkan develops since any card that supports DX11 can support Vulkan AFAIK and there would be no point in sustaining a windows only backend anymore

Also, contrarily to whatever scare they have at dolphin, (possibly because plugins perfectly modularize stuff? I dunno) we have no X renderer is a burden to Y renderer problem.
Anyway, everything is up to whoever devs will want to tackle the challenge.

funny thing: if CL gets merged into Vulkan in the future, we could say we technically already have a Vulkan renderer
EDIT: @gregory38 you should resend your patches I guess?

Various DX11 card won't support Vulkan. Besides

  • Dx11 renderer is based on Dx10 features
  • OpenGL renderer is mostly based on Dx10 features. For example it runs fine on Sandy Bridge Linux.

Eventually both DX11 and OpenGL renderer will/might die. But Vukcan won't solve the texture cache management. And we need advance blending. I'm not sure it is exposed in VUlkan as it requires at least a Maxwell GPU on Nvidia side. By the way, this extension will reduce the number of draw call and increase the load on the GPU. So Vulkan gain will become smaller.

What do you plan to move GSDX to when and if those go? I just hope whatever happens my RX480 will handle it. Ill upgrade to a GTX 1080 in a few years though maybe

I don't have any plan. My GPU is an "old" Kepler. I won't upgrade soon as I want a sub-75W but powerful enough GPU with free driver support.

I don't know the AMD status neither Intel one. I think recent GPU should be ok but I really don't know.

AMD has some probs with Vulkan as well (cough blending) , also it's good to have several api available. Some might have issues so it's good to have an alternative.
Take intel for example. DX11 has issues on Kaby Lake , OpenGL is a mess and you might want to use DX9.

It sucks being an AMD user right now >.< just got BSOD with SilentHill4 in OGL the only SH game I haven't beaten and DX11 has an entire layer of atmosphere missing. I've heard Nvidia has issues too but I'm not aware of how bad.

> Buy an AMD card
> Nuke windows and say hi to Tux
> Install open sauce driver
> Profit

...Anyway, please, really, it's really all up to whatever fancies a willing dev will have.
And I don't know of anybody with either time or will to begin with.
So please, let's stop the quite wishful thinking chatter.

Vulkan please. Replacing the existing OpenGL renderer which I hear is much slower than the existing D3D renderer with a single Vulkan renderer would help out PCSX2 a lot. While you could focus on the OpenGL renderer for both Linux and Windows, it might be easier to just pave over the old renderers with a single API and focus on it instead. Less code to maintain.

A new GUI would also really help to modernize it!

After many years Metal Gear Solid 2 intro scene still lags, a modern implementation would be welcome especially if it resolved the issue.

What is needed to create a Vulkan renderer? why not crowdfunding this project?

We would need more developers/manower. Crowdfunding is still possibility in the late future.

Would need competent vulkan implementations across the card vendors, as Gregory has pointed out.

Vulkan !=magic solution to performance issues. We would be better off with more people working on core GSdx issues than we would be with a working Vulkan backend.

If the OGL renderer is much slower than the D3D one, unless someone can fix the performance disparity, VLK is an option. Depends on what contributors are good at doing I guess. Once you have VLK going you don't have to worry about specific driver bugs like with OGL, so it seems easier to maintain in the long run, just more work up front. The RPCS3 devs sure seem to love it: https://rpcs3.net/blog/2018/01/23/rpcs3-2017-wrap-up-a-stunning-year-of-progress/

X-Y=/=Z

You might be oblivious to this, but OpenGL issues that occur in AMD drivers also often affect Vulkan too in some way.

@Swiftpaw OGL is as fast as D3D (well OGL has better vertices streaming capabilities). However AMD proprietary OGL implementation is bad. And nothing prevent AMD to release a broken Vulkan implementation too. It is sad to spend weeks of work to have a working solution for only AMD users...

Gsdx main speed issues isn't the rendering API overhead. But the emulation of the GS itself which doesn't map well to modern GPU. See my previous post for an example of what can be done to really improve the emulation.

AMD's OGL performance is probably only slower on Windows. I'm pretty sure mesa is faster by now.

Please discuss that and post results on #2144

@gregory38 The fact that VLK drivers are way simpler than OGL, and the fact that AMD's is open source, means that VLK performance can quickly be up to where it should be. VLK works great right now for both AMD and NVIDIA chipsets especially with the newest Mesa 18.0.

The fact that VLK drivers are way simpler than OGL, and the fact that AMD's is open source, means that VLK performance can quickly be up to where it should be. VLK works great right now for both AMD and NVIDIA chipsets especially with the newest Mesa 18.0.

And then there's Windows...

Regardless, our time is much better spent fixing core GSdx bugs then creating a whole new Vulkan backend for (to mostly benefit) AMD users. We just don't have the time and resources right now for it anyways.

The fact that VLK drivers are way simpler than OGL

Seriously, OGL isn't that complicated ! If Mesa can do it right with 10 peoples, the proprietary driver should be able to deliver a working driver. Note GSdx is based on modern OGL.

Anyway, Vulkan basically removes the upper layer of the driver but the complex (because it depends on the HW) bottom layer remain. Issue on dual blending is basically 1.0f * 0.0f == 1.0f! And guess what some people report issue on dual blending on Vulkan, so ...

Honestly AMD users will benefit more from advance in-order shader capabilities than from Vulkan. Those capabilities will reduce the number of draw calls for most complex GSdx effect (accurate blending / destination alpha testing / alpha testing...) and it will allow to emulate GS behavior correctly. The drawback of this extension is that it will increase load on the GPU but AMD's GPU contains more units so it should be fine.

And then there's Windows...

Apparently AMDVLK is going to be used on windows as well, so we may benefit from any bug fixes done on the Linux side, hopefully...

Seriously, OGL isn't that complicated

Yet there seems to be more that get it wrong then right.
MESA (The GitHub mirror lists 538 contributes) and Nvidia (sometimes allows invalid behaviour) get it right.
Last I heard Intel's OpenGL drivers have issues with GSDX OpenGL render (bugs? or missing extensions?).
AMD drivers are the reason we have this issue.
MacOS (I assume they write their own state tracker) is stuck in OpenGL 4.1, potentially forever.
Mobile drivers, according to Dolphin, are a complete disaster area (3 different GPU manufactures, excluding Nvidia) and they have it easy, only needing to implement OpenGL ES, a subset of OpenGL.
ANGLE gets OpenGL ES right, if a little behind on the latest spec.

people report issue on dual blending on Vulkan

Fun fact, that was broken on Nvidia to begin with as well
Apparently it's fixed (for both Nvidia and AMD) but there isn't a ready made test case for me to test with.

Last I heard Intel's OpenGL drivers have issues with GSDX OpenGL render (bugs? or missing extensions?).

Intel misses some/a lot of extensions but at least the performance is good.
I'm actually not sure how it is on Kaby/Coffee lake.

All software have bugs. What I mean is that Vulkan isn't 10 times easier than openGL. Yes you have all crufts in openGL but GSdx doesn't use them. Actually our renderer is rather basic.

The issue of openGL isn't complexity but that people don't use it/don't care about it. D3D is as complex as OGL and yet it is working. Again Vulkan only remove the upper layer of the driver, this layer is maybe big in size but it is common for all GPUs.

Obviously, as you do the upper layer in the app, you will get more bug in the app and less in the driver. So Vulkan won't solve magically all drivers crappiness neither boost perf to the moon.

Is the D3D backend much faster than the OGL backend right now on PCSX2? If so, then OGL needs to be either fixed, or replaced with VLK. Which one to choose depends on the developers and contributors. The details about which one is better and easier to maintain, troubleshoot, and optimize is the part that is debatable.

On nvidia/Intel OGL is faster comparing games that both are rendered equally. Otherwise OGL is a bit slower but that's because it's more accurate than D3D.

It's AMD that needs to fix their drivers.

Why does the same thing has to be repeated over and over and over again?
OGL is almost as perfect as you can get. It even allows a couple of additional things than dx.
Simply put, proprietary AMD driver is batshit slow with it, but it's not the end of the world for those cards.

But there are other priorities at the moment (like you know, fixing more core bugs, or making hacks automatic). And it wont't be by commenting this tracker that magically manpower with time to waste will appear.

There is 2 mandatory feature to improve GSdx, allowing in-order memory access, sampling data from the frame buffer.

  • DX12: Rasterizer Order Views:

    • Requires at least, VEGA on AMD, Maxwell V2 on Nvidia, Haswell on Intel

    • I don't know if it allow to read the current framebuffer as the current texture

    • Obviously limited again to windows user

  • Vulkan:

    • So far no equivalent of ROV (or did I miss it). Dunno if it can be emulated (fast enough) with sub-render pass.

    • sub-render pass should allow to read the current framebuffer as the current texture

Conclusion, we still doesn't have a cross-platform API with modern capabilities.

@gregory38 Have you tried shouting at people on the Khronos forum about lack of ROV? And is this (https://github.com/KhronosGroup/Vulkan-Docs/blob/1.0-VK_AMD_rasterization_order/doc/specs/vulkan/appendices/VK_AMD_rasterization_order.txt) relevant? (I couldn't quite tell from the wording if it was what you needed, or the exact opposite of what you needed).

No, I don't have time to code, neither the hardware to use this feature. So I won't bother people. Hopefully someone will create an extension when enough hardware support it.

It isn't the same feature. This AMD extension is about to relax order of the rasterization in the fixed function unit. For example, when you blend 2 triangles, you must do triangle 1 and then triangle 2. However if blending is commutative, the order isn't important so you don't need to ask your GPU to sort the data before the blending (it could be interesting for game).

In our case, we need to re-implement in-order fixed unit in the fragment shader. However fragment shader are out-of-order. So let's say we implement blending in the shader, we don't have any guarantee that triangle 1 will be processed before triangle 2. Current GL implementation (accurate blending) allow to split the draw into N draw calls of 1 triangle but it is slow. There is a GL extension to ensure order of the fragment shader (same as ROV), this way you can compute triangle 1 then triangle 2.

This feature allow to emulate tons of GS feature

  • blending (alpha coefficient range from 0 to 2)
  • blending bit-masking
  • color wrapping (258 is 2)
  • destination alpha testing
  • alpha testing
  • ...

Anyway, now that openGL support loading of spirv shader. Maybe it would be nice to use spirv instead of glsl. Unfortunately we rely on various ifdef, so it won't be easy to port. And we will miss feature not yet supported by spirv (typically interlock extension). Well glsl will likely stay for legacy drivers.
However spirv will be compatible with Vulkan. And it will potentially fix dual blending on AMD driver.

spirv shaders can use "specialization constants" which are set before you create the pipeline. I reckon this is the closest thing to ifdef you will get at runtime.

@gregory38 according to mesamatrix.net that extension isn't supported by any of the open source drivers, so intel and amd users won't benefit from it

Yes, I know. However I'm afraid it won't work for declarations stuffs. For example layout to control early depth test, and few others bits. Doable but not straightforward.

Open source guys are working on spirv. I don't worry about it. Anyway the biggest gain will be on AMD proprietary driver. It will give us the guarantee that glsl compilation is correct (because it will be done by an offline compiler)

Open source guys are working on spirv. I don't worry about it. Anyway the biggest gain will be on AMD proprietary driver.

Sure thing, it should land remotely soon (though except miracles that's still a no go on older cards)

OTOH, I just found out here, the point for not having ROVs was the supposed* "nvidia only" nature of it.
I wonder if you couldn't bump one of those threads, or make a new feature request?
(I, for one, feel too peasant to write there)

*actually, intel should already have had it, called INTEL_fragment_shader_ordering (funnily introduced at the same time of InstantAccess's zero copy)
EDIT: I'm kinda puzzled by what the point of this issue should be by now though. Granted nobody is refusing or challenging a low-level-api renderer. But whatever the biggest advantages there might or might not be, listing them doesn't solve the actual only problem - that is lack of manpower.

Regarding DX12 ROV's and fragment shader ordering/interlock, it sounds like a bad idea performance wise according according to a test case and a comment:

https://twitter.com/g_truc/status/581202849918504961 (it's faster to just use glTextureBarrier if fragments do overlap during fragment shader execution)
https://twitter.com/axelgneiting/status/897533271281631233 (not helpful comments from developers on selling the hardware feature so avoid using ROVs on Vega too or similar)

Vulkan working group is also addressing the lack of ROV in Vulkan in this video after 39:15 but the idea gregory stumbled upon which was similarly brought up before by an AMD developer is seemingly more attractive when it's also a much more portable solution since it works on far more hardware than just hardware that supports ROVs. I have no idea if constantly flushing the GPU caches could actually end up being competitive with inserting sync points in the fragment shaders depending on how contentious the memory accesses becomes but Vulkan does look like it presents some more upfront driver optimizations with regards to how cache flushing behaviour can be handled. Vulkan subpasses and input attachments could be seen as arguably the best way to avoid AMD OpenGL bugs on top of implementing blending modes on PS2 with the lower driver overhead just as a bonus. I don't believe IHVs other than Intel intended for ROV's or a similar feature to be highly performant and I don't think AMD cares about having more than just a conformant GL driver so I doubt they're ever going to add and extensively test out fragment shader interlock unless it becomes core in the spec which won't happen for a while. More investigation should definitely be done between Vulkan subpasses/input attachments and glTextureBarrier to see how each would perform with emulating PS2 blending.

AMD also exposes fragment shader ordering (the intel extension) on all GCN GPUs but I doubt it works since there very well might be an ordering bug.

Interesting. However, it would be better to have the real testcase and how it was implemented. Typically it might depends on how much primitive overlap. Hum, I don't know what happen between draw call, maybe we still need a flush. Anyway, I'm surprised of the performance. Because as far as I understand, glTextureBarrier require multiple draw calls and cache invalidation is quite slow (except if the cache is small).

To be honest, I kinds of expected that fragment interlock to be slow on GPU because it forces a serialization of shader execution. It depends on the granularity, is it only at fragment level or at primitive level. I.e. does the rasterization of the full primitive is halted if at least 1 fragment overlap.

As far as I'm aware the test case isn't any less real but what the author does do is measure the overhead of a primitive overlap per draw call in each implementation. The conclusion is in the best case scenario when there is very little overlap then interlock is close to free but in the worst case scenario where each primitive overlaps during fragment shader execution then it's an order of magnitude slower than using a texture barrier. With shader interlock, I don't think a flush is required but you absolutely need to guarantee that fragment shaders will finish with respect to triangle dispatch order and that's commonly achieved through stalling fragment shader execution. By inserting a sync point during fragment shader execution, you pretty much free yourself of invalidating the cache thus saving bandwidth but you risk running into stalling the GPUs execution units like you said so it's a double-edged sword. GPU caches are relatively small compared to their register space so it's not THAT bad to flush the cache compared to potentially hitting a sync point. glTextureBarrier may require multiple draw calls however, with shader interlock you have to insert a sync point in your fragment shader so pick your poison ? If you're being considerate then shader interlock might end up being alright depending on your use case.

As far as granularity that might depend on hardware implementation per IHV and Apple's Metal 2 documentation about raster order groups might shed some light on this topic. It sounds like Intel has one of the most advanced implementation by being able to track multiple mutex per pixel like the A11 GPU so they can view a bigger window of fragment shader execution to both increase parallelism and minimize stalls. For Nvidia, they might only be able to track a single mutex per pixel so that is probably the reason why it's hard to beat the texture barrier implementation in their case with some extreme conditions and I don't imagine it to be much different for AMD either. Another optimization in using shader interlock according to an Intel engineer at slide 49 is to randomize the triangle submission order to exploit the screen space incoherent placement of geometry to minimize the primitive overlaps for the purpose of avoiding these sync points. ROVs/ordering/interlock is not a cure, it just has different trade-offs compared to texture barrier ...

With that out of the way, I think these are your most realistic options. You can use D3D11.3 to access ROVs on a wider range of hardware (cause of drivers) but it's Windows 10 only or you could port your GS emulator to Vulkan to make it work on an even wider range of hardware with a near reliable workaround such as using subpasses with input attachments to emulate PS2's blending behaviour. Performance is the biggest mystery between the two potential implementations but only you know the specifics of GS emulation to know whether or not if each option is worthwhile or not to spend your time on.

Intel having the most advanced capabilities isn't a surprise considering INTEL_fragment_shader_ordering is there since Haswell, as I mentioned some posts ago (maybe benchmarks of last generations would be interesting)
Funnily, yet again making opengl the preferred choice (for as much this seems windows only for now)

EDIT: an experimental mesa branch.. exist?
EDIT2: yep! It still didn't landed for problems though.
EDIT3: it's happeninggg
EDIT4: both the intel_ and nv_ equivalents have been merged too EDIT5: the former has been backed out

Meh, don't keep it as an Intel only solution. If it's gonna be a Windows biased solution too then the solution should also be cross-vendor as well by including D3D11.3 ROVs. It's arguably more fair to lock out Windows 7 users once it stops getting regular security updates by early 2020 than it is to lock out a vendor (AMD) with perfectly capable hardware features. (Vega+)

It'd feel a whole lot more secure for an AMD user with the appropriate hardware to deal with Windows 10 than it is to depend on AMD itself to deliver a working fragment shader interlock extension.

No Vulkan, sure but D3D will forever remain the future when Khronos Group continues to keep disappointing so if not D3D12 then it should be D3D11.3 that should be the one getting some love since I think it might be the unsung hero in this case.

AMD is also supporting that extension since ages (EDIT: nvidia reportedly has its own? GL_NV_fragment_shader_interlock maybe?)
Then of course amd and opengl makes shiver...
(and 1) this wasn't really about W7, but linux 2) even if it wasn't I'm always crazed by why one should care about "official security updates" rather than the usual costs/users ratio)

Except fragment shader ordering doesn't actually work on AMD and maybe even on Vega GPUs cause AMD can't bothered to add and make another GL extension work. Nvidia's version of fragment shader interlock is identical to the ARB version but both of them are just clones of fragment shader ordering (the original functionality).

Users probably should make official security updates a priority over worrying about getting a program to work on what is arguably a "deprecated" OS that has been out of sale. Absolutely no reason for future versions of software to be burdened with maintaining far reaching backwards compatibility when the users could just choose to use the older builds instead.

A user who trades liberty for security deserves neither.

If I understand it right, one way or another, texture barrier will create some bubble in the GPU pipeline (likely due to the multiple draw call). I don't see how ROV could be 10x slower. Anyway, it isn't that important, I don't think we have tons of fragment overlapping in the same draw call. However we could have more than 1000+ primitives which isn't doable easily with texture barrier (currently 1000+ draw calls and 1000+ barriers). ROV is surely a tad slower than normal rendering, but it is the only option.

Note: the only requirement to implement a feature is to have a dev that got both the OS and the hardare ;)

Worst case, people will reduce the upscaling factor ;) Or buy a better GPU

It's really not that hard to imagine a scenario where ROVs are slower. Sometimes that could be due to reasons such as content (lot's of overlapping primitives), hardware implementation (single mutex per pixel means that some GPUs are only able to track per triangle dispatch order thus pretty much stalling the fragment shader execution even for non-conflicting accesses by threads in the same draw call!) and how the feature is used. Flushing the cache might be bad but serializing fragment shader execution by submission order might not be better.

Along with achieving screen space decoherence to minimize the amount of concurrent overlaps if it's applicable in your case, another recommendation I often see is to separate the non-dependent code from the dependent code if your using triangle dispatch ordered access to memory to minimize the length of stalls caused by the ordered execution of fragment shaders. Basically, try to avoid using ROVs as much as possible from the beginning of your fragment shader program and keep it to the tail end of your fragment shader program because just as little as 4 overlaps on the same pixel can potentially make your fragment shader execution run 4x slower!

ROVs is not the end all or be all cure for a programmable graphics back-end like you dream it to be but I acknowledge that it does have some viable use cases and if you say that overlapping fragments are not a problem in your use case then who am I to argue ? (just beware of the pitfalls that come with the hardware feature if you're ever going to use it)

(MFW I'm the one bumping this thread)
For the records, ROV is being seriously discussed here https://github.com/KhronosGroup/Vulkan-Ecosystem/issues/27

I don't think it's in their interest to seek ROVs for Vulkan or if they even want to pursue Vulkan at all. The current developers seem far more interested in getting driver support for fragment shader interlock on OpenGL ...

I'm pretty on the sure side Khronos committee has very little overlap with mesa developers (which by the way got interlock landed for ≥gen7 Intel hardware today)
EDIT: aaaand it's ≥gen9 now

I was talking about the developers behind this project. They are not one bit interested in working on Vulkan right now if it has the available extensions that they want and even if there are drivers too supporting that extension in Vulkan. They would rather just plead IHV's instead to support fragment shader interlock or other extensions in poorly designed and bug ridden OpenGL as much as possible to access the feature without having to refactor the project so much. When your project has issues with 2/3 IHV's on most hardware and they both pass conformance tests then something obviously must be very wrong with this project to have that happening ...

At least with Vulkan when there is a driver bug, there's a much easier workaround to prevent it but that's not it's only benefit when Vulkan support could potentially act as a boon for Mac users as an easy gateway for Metal support since it's translation layer has become mature enough for emulator usage ...

passing conformance tests and doing it with decent performance are completely seperate things.

Honestly I'm getting tired of people saying that gl implementation of PCSX2 is broken because drivers don't have bugs. As you said, conformance test is green so everything is fine on driver side.... Seriously, on AMD side (only proprietary driver, strangely free driver is fine), we ask the GPU to do a multiplication.we got either
1/ 1 * 0 = 1
2/ a driver crash
I don't know windows Intel drivers status. But yet again the Linux driver is fine. So how do you explain it ? And don't tell me that using a multiplication for blending is an undefined operations!

On Vulkan topic. Yes Vulkan is nice but it misses modern feature, called either rov or shader interlock. This feature is a requirement to emulate some gsdx effect. Without it, the API is just useless for us ! Then perf wise, Vulkan surely reduces the overhead on the CPU but it could increase the overhead on the GPU. Plenty of games (ratchet and clank, all snowblind engine game such as baldur gate, even zone of enders) that seem to suffer of drivers overhead were fixed recently without Vulkan because Dx/GL weren't the issue in the first place.

Yes, Vulkan is likely the future for portability (well if drivers aren't too broken !)

Getting bugs with a "conformant" driver shouldn't even be a thing regardless like we see with OpenGL. Why should we be the ones to deal with inferior conformance tests from Khronos Group with OpenGL in comparison to Microsoft with Direct3D ...

It's bad enough that Khronos Group doesn't put their foot down with OpenGL ES implementations ...

Then submit them a test case.
I don't know what level of polishing you need, but you should already be more than halfway there with the sources of gregory's one I posted on the amd forums.

If they suck hard with GL, doesn't mean the spec is faulty.
And I'm not sure if you understood that at the moment: Vulkan is providing less features than OpenGL.

You don't understand how to interpret conformance test status.

When it is failling, you can conclude that driver is shit.

When it is all green, you can conclude that you don't know the driver status, perhaps it is good, perhaps it is shit.
Note: even AMD acknowledged their driver was faulty..........

@gregory38 Do you guys actually need ROV/interlock to emulate the GS accurately or is it just a part of a wishlist ? (I thought texture barrier was a solution) Is OpenGL useless on a similar basis for you guys since that feature isn't available on lot's of gl hardware/drivers ? When or if ROV/interlock does come around on Vulkan and the translation layer matures, is Vulkan actually going to be a serious consideration in the cards or will it be shelved in favour of getting IHV's to support the gl extension instead ?

@mirh Khronos Group standards/specifications are worthless if they aren't the gospel or if it isn't mirrored in the CTS ...

@gregory38 Something must be very wrong with OpenGL conformance testing or Khronos Group standards testing methodology in general if passing hardware/driver combinations aren't even guaranteed to work in the allowed use case. Just solely in principle alone, OpenGL development should be ditched in favour of Direct3D or Metal since they actually have competent conformance testing and portability is not a good reason to continue with it anymore since there are only two OSs (Windows & Linux) which even have remotely reasonable support but there's even less choice than thought in that case as well since only IHV has the ideal implementation for the behaviour you want ...

Something must be very wrong with OpenGL conformance testing

Or they *just* plain simply lack the test?
I found some stuff, but if really any it is only applied against GLES.
Just solely in principle alone then, I reckon Dx as more stable just by virtue of having had "the voice" of >95% of the market for 25 straight years. It's not that bugs don't happen there. They are just fixed asap.

\@gregory38

As linked above, ARB_fragment_shader_interlock is everywhere be it in its official form or its NV_fragment_shader_interlock/INTEL_fragment_shader_ordering predecessors.
There's no "if" wonder (performance aside, but we all know who's the felon there)

But I think you are missing something: nobody is going to say no, even to a goddamn metal renderer.
There's nothing to "seriously consider" or choose exclusively.
It's only that time is limited.
And *of course* you have to focus on what brings you the most gain.
No, even if win-amd was half the market the manpower required to reinvent the[ir broken] wheel would still be unjustified. Let alone if the fix/alternative for the time being has hard downsides that would affect every single body.

You read the conformance test status in the reverse way.

If conformance is red then driver is bad.
The opposite is
If there is NO bug in the driver then conformance will be green.

A green conformance status means nothing (it is the same for all test suite). You can't prove that code is working.

TextureBarrier is a poor solution. Most users can't run the high mode and it isn't enough to be accurate. So no, fast in-order shader operations are a must for us.

I don't know about you guys, but in linux ogl, I get a massive decrease in perf.
Taking the same build, same settings into wine, and it's faster then the native build.
I said this before somewhere else..
And I've since messed with it a few times.
And sure enough, the linux build is plain slower on my end.
This is on ubuntu 14.04 base, I haven't checked the perf on 1804 yet since I don't actually have it installed on my rig yet (that could be some time..., a year maybe if I'm lucky).
And yes I know the mt env var's and stuff...

A vulkan renderer would be a +1000 for me.
I think it would make a difference.

My specs: 4930k, and 1080ti.
Same results as the 680gtx pretty much, massive slowdown in linux ogl.

I'de like to get it on par with the windows perf, being that pcsx2 was one of the main determining factors for me to switch to win7 a long time ago... (dx11 support, better rendering)
I'm on linux 24/7 these days.
I'de really like to put it to them, Linux has potential, 75% of the benches outperform windows by quite a bit sometimes. (it's slower in other ways but..)

I'de really like to get it within a few frames.

Opengl being slower on nvidia driver doesn't make sense.
Open an issue just for that.

As for the performance issues with OpenGL, I don't think it'll ever be solved because of an API design flaw that's persisted with it since the beginning and that's with it's massive global state. The time is coming to put an end to the dead gfx API one way or another ...

There exists wrappers out there like DXVK and VKGL which is early in development (it appears to be an AMD driver engineer on his free time) but should ideally meet the needs of portability or maintainability for small projects like this ...

Since DXVK is very mature and once they manage to add support for D3D11.3, that could be potentially targeted to get access to a feature similar to fragment shader interlock which would improve the emulation regarding the blending accuracy but best of all with both of those project you don't have to maintain two backends anymore especially with one of them where in the future IHVs might not even ship support for it anymore in their driver's ...

Would it be possible to make DXVK work with the DirectX 11 option of PCSX2?

https://github.com/doitsujin/dxvk/releases

It works amazingly well on other games, like Skyrim or The Sims 4 and GTA 5 and so on.

All you gotta do is unzip the .dll's into the folder where the .exe is located at or inside the "bin"-folder, if one exists and the game will automatically start using Vulkan.

Though DXVK is primarily designed for Linux, it works just as fine on Windows 10.

Is there any way to make PCSX2 with DirectX11 use the dx11 to Vulkan .dll?

As for the performance issues with OpenGL, I don't think it'll ever be solved because of an API design flaw that's persisted with it since the beginning and that's with it's massive global state. The time is coming to put an end to the dead gfx API one way or another ...

There exists wrappers out there like DXVK and VKGL which is early in development (it appears to be an AMD driver engineer on his free time) but should ideally meet the needs of portability or maintainability for small projects like this ...

Since DXVK is very mature and once they manage to add support for D3D11.3, that could be potentially targeted to get access to a feature similar to fragment shader interlock which would improve the emulation regarding the blending accuracy but best of all with both of those project you don't have to maintain two backends anymore especially with one of them where in the future IHVs might not even ship support for it anymore in their driver's ...

it wouldnt fix the fragment shader interlock because amd isnt going to support it. I assume your an AMD user cause this is mostly somthing only amd users benefit from. but yea AMD isnt supporting that feature not even on vulkan

Regarding using DXVK, you can convince PCSX2 to use it with https://github.com/PCSX2/pcsx2/issues/2106#issuecomment-356480660

As for OpenGL wrappers, I would personally put more hope on Zink (A mesa driver that runs on Linux) getting ported to windows, or even the DX12 Mesa driver that's in the works

Regarding fragment shader interlock, wrappers to vulkain arn't going to solve that as the that extension is missing in AMD's Vulkan as well (except maybe on the 5000 series?)

The closeset feature DX has to fragment shader interlock appears to be ROVs(?), which also isn't supported on pre-vega AMD hardware

Yes interlock is rov. It needs a modern GPU. But AMD stated they don't want to export the feature on vulkan even on recent GPU.

If driver doesn't support it, nothing can be done. No translation layer will change it. End of story.

not gonna lie if Vulkan brings better performance im all for it, multiplat for the win! got an nvidia gpu btw

@gregory38 @NEOAethyr

Yes interlock is rov. It needs a modern GPU. But AMD stated they don't want to export the feature on vulkan even on recent GPU.
If driver doesn't support it, nothing can be done. No translation layer will change it. End of story.

Well, closed AMD driver is irrelevant for Linux, unless you need it for a supercomputer render cluster or something. I'm sure such implementation will not be blocked from merging for RADV… but it will not be written by AMD staff either :(
In fact… https://gitlab.freedesktop.org/mesa/mesa/-/issues/3511 - and there is Intel implementation to work from.

Worst case, people will reduce the upscaling factor ;) Or buy a better GPU

You say that but in all the years to today upscaling factor has have not been changing a single fps for me and my RX580 and previous HD6870. Strangely, same thing happens with RPCS3: changing rendering resolution from 720 to 1440p with same other settings doesn't change anything other than picture quality, it's entirely bottlenecked by something else. I tested recent git snapshots of PCSX2 with MGS2 (intro cutscene, particularly wide shots of the ship and view of upper deck) and RPCS3 with MGS4 (main menu and training range) and they both exude this behaviour while erratically loading the CPU.

You may remember me getting on your nerves previously on some other issue with screenshots with driver performance overlay of PS2 and GC versions of RE4 where PCSX2 completely failed on lowest resolution of famously gimped PS2 RE4 version while Dolphin was giving out perfect 60 on highest upscaling of the original. You were blaming "bad drivers" but now Mesa's AMD OpenGL implementation is probably the most exemplary implementation there is (missing optional extensions such as this notwithstanding). And OpenGL spec is deprecated, driver implementations are largely on life-support while Vulkan drivers are developed, if there were new GPU makers (in a world where blatant global oligopolies are not welcomed by corrupt politicians) they would probably skip making their own OpenGL implementation and use a OpenGL->Vulkan shim.

You may try to quickly benchmark apps in Mesa with these (per-core CPU usage matters over overall stat though)

environmental variables.

GALLIUM_HUD_VISIBLE=false
# SIGUSR1, use `kill -10 $(pidof <proc-name>)`
GALLIUM_HUD_TOGGLE_SIGNAL=10
# per-frame accounting
GALLIUM_HUD_PERIOD=0
# tune for >96 DPI display
#GALLIUM_HUD_SCALE=1
# use `GALLIUM_HUD=help glxgears` for options
GALLIUM_HUD="cpu+GPU-load,.dfps+.dframetime,.dbuffer-wait-time;requested-VRAM+VRAM-vis-usage+mapped-VRAM+requested-GTT+GTT-usage+GFX-IB-size,GPU-shaders-busy+GPU-ta-busy+GPU-vgt-busy+GPU-sx-busy+GPU-wd-busy+GPU-sc-busy+GPU-pa-busy+GPU-db-busy+GPU-cp-busy+GPU-cb-busy;.dprimitives-generated+.dclipper-primitives-generated+.ddraw-calls,.dsamples-passed+.dps-invocations"
GALLIUM_PRINT_OPTIONS=true
GALLIUM_DUMP_CPU=true

Well, now Dolphin has Vulkan with genius "async shader compilation / ubershaders" and still rocking with perfection. It's different architecture with different hurdles, sure, but how much is it really the drivers and how much is it inoptimal approach ? Is that use-case is so unique that no driver dev stumbled on it otherwise while not bothering with emulation their whole lives and keeping that one code-path extremely, orders-of-magnitude sucky but not completely broken ?

Windows users with AMD GPUs desperatly needs Vulkan backend on PCXS2.
Right now I'm using software mode (burning my CPU) because DX11 has a lot of glitches and Opengl has unplayable performance (worst than software mode).
If I knew PCXS2 had such terrible performance with AMD GPUs I would have bought a Nvidia GPU, but I didn't and I'm not going to buy another GPU just to play PS2 emulation.

@OtavioRaposo you can try to install:
https://www.microsoft.com/en-us/p/opencl-and-opengl-compatibility-pack/9nqpsl29bfff?activetab=pivot:overviewtab

maybe opengl on top of mesa on top of DX12 is faster xD.

Not sure if I am talking seriously or if this is a joke.

"OpenGL version 3.3 and earlier" PCSX2 is needing features from GL 4.5 iirc
similar issue with zink (Vulkan on mesa)

Cool that it's on the store, was wondering where it was going to turnup

Edit: maybe it will support newer OpenGL versions in the future, not been able to find roadmap regarding it

I find DX11 to perform alright on an RX 580, though I'm certain Vulkan could do so much better.

Most games run perfectly fluent on 2560x1440 (custom), however they end up looking blurry due to the PS2 Interlacing thing.

If I choose the 5K preset the Interlace-Blurryness is completely negated but some games don't run fluently anymore.

Also there is this issue of playing in Windowed vs Fullscreen mode (for me at least). In Windowed Mode the game overall will appear much more fluent while on Fullscreen it'll look like it's at 30 FPS or lower.

I assume Vulkan would be able to get fully rid of those issues on AMD, while in general being the best option for both AMD and nVidia graphics cards.

@OtavioRaposo you can try to install:
https://www.microsoft.com/en-us/p/opencl-and-opengl-compatibility-pack/9nqpsl29bfff?activetab=pivot:overviewtab

maybe opengl on top of mesa on top of DX12 is faster xD.

Not sure if I am talking seriously or if this is a joke.

Have you tried this? At this point, I'm accepting any suggestions.

I find DX11 to perform alright on an RX 580, though I'm certain Vulkan could do so much better.

Most games run perfectly fluent on 2560x1440 (custom), however they end up looking blurry due to the PS2 Interlacing thing.

If I choose the 5K preset the Interlace-Blurryness is completely negated but some games don't run fluently anymore.

Also there is this issue of playing in Windowed vs Fullscreen mode (for me at least). In Windowed Mode the game overall will appear much more fluent while on Fullscreen it'll look like it's at 30 FPS or lower.

I assume Vulkan would be able to get fully rid of those issues on AMD, while in general being the best option for both AMD and nVidia graphics cards.

My main problem with DX11 is that it's full of glitches and innacuracies. It's a big turnoff for me.
And the hacks don't solve the problem, cause they impact so much on performance, the games become unplayable.
An exemple of this is Jak 2 and Ratchet & Clank, completly broken shadows.
PCXS2 team should be really taking Vulkan more seriously.

You should read into what Vulkan does and what benefit it gives to PCSX2. Vulkan is not Superman carrying everything, sure there is some benefit but AMD isn't implementing fragment shader interlock. You can be as certain as much as you want ofcourse. Same thing that happened to x64, if you don't believe me then wait till it's here and compare between the renderers.

You should read into what Vulkan does and what benefit it gives to PCSX2. Vulkan is not Superman carrying everything, sure there is some benefit but AMD isn't implementing fragment shader interlock. You can be as certain as much as you want ofcourse. Same thing that happened to x64, if you don't believe me then wait till it's here and compare between the renderers.

Windows + AMD + Opengl = unplayable.
Windows + AMD + Vulkan = playable.

These words are fact, not opinion.
AMD drivers poorly support Opengl, and they aint never gonna fix that.
On the other side, Vulkan is constantly being updated and there's hope.

@OtavioRaposo sorry, i havent tried It.
I have an AMD fury maybe I will give I try but @TheLastRar said OpenGL 3.3 is not enough so probably It wont work.
Anyways, zink and mesa d3d12 is expected to support OpenGL > 3 somewhere in the future.

Still hoping for that.

Still hoping for that.

Continue hoping my friend 😉

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bryc picture bryc  Â·  5Comments

mirh picture mirh  Â·  6Comments

AraHaan picture AraHaan  Â·  5Comments

Clarke2131 picture Clarke2131  Â·  3Comments

IceString3 picture IceString3  Â·  3Comments