As the title, please give users the method to compile a macOS version or release a official version.
Not a RPCS3 limitation. Whenever macOS has proper OpenGL or Vulkan support, support should happen.
You should ask the Khronos Group or Apple itself, not us.
I'm working on getting the Vulkan Portability support for RPCS3.
Currently it builds successfully but the swapchain implementation is missing.
https://github.com/KhronosGroup/MoltenVK/issues/102
This is a huge problem that is limited by apple blocking some hardware features from the metal API. RPCS3 (and the real PS3 as well) rely heavily on this for everything, including the base PS3 raster formats which do not actually exist on any API formats. You don't really need swapchain implemented if you have X, you can just fall back to X11_swapchain code to use a soft swapchain.
@kd-11 good point. Is the port already in progress then? Also, is there a list of the swizzles that need to be supported?
Plans of a port were abandoned, but getting things to compile was very easy (you can see __APPLE__ blocks in the vulkan code). All possible swizzle combinations have to be available as ps3 games freely use this feature. There is no list of swizzles to support because its completely reprogrammable by the guest application and used quite heavily.
@kd-11
but getting things to compile was very easy
Well, I did waste quite a few hours. What's easy for you may not be as easy for me, especially without seeing this code base before. Anyhow, I made a PR with my changes ^
There is no list of swizzles to support because its completely reprogrammable by the guest application and used quite heavily.
It's unfortunate that Metal doesn't support this, but it's also possible to workaround. Technically, the swizzle info can fit into 8-12 bits per texture, and the arithmetic of swizzling is super straightforward for the GPU. So I don't see it as a blocker for either implementation or performance.
Another issue I faced (earlier than swizzling) is the use of 20 samplers by a pipeline layout. We aren't currently reporting this limit via Vulkan device limits, but it has the field maxPerStageDescriptorSamplers which is responsible for that, and RPCS3 ignores it.
Also, btw, I strongly recommend re-opening the issue. The problems faced here may be non-obvious, but they are solvable/actionable on.
I suggest opening a new issue regarding MacOS support with a proper description and a task list to replace this one
@AniLeo I agree. Would you want me to open it? I'm neither the client of this functionality, or a current developer of RPCS3, so it would be a little weird :)
@kvark
It's unfortunate that Metal doesn't support this, but it's also possible to workaround. Technically, the swizzle info can fit into 8-12 bits per texture, and the arithmetic of swizzling is super straightforward for the GPU. So I don't see it as a blocker for either implementation or performance.
Actually you need 16-bits (2x4 for channel selection, 2x4 for lookup override). It also adds up quite a bit; for reference the gamma correction function which is much simpler than remap decoding made 1440p and higher resolutions out of reach for some applications when using mid-range GPUs. This is however not a blocker, I agree. The best I can do here is refactor some functionality and introduce a compile-time switch to enable use of the software remap path and then someone who has a mac can work with that. As for the restricted number of samplers - that is unexpected. I have seen games use 14+ units before so its not completely useless either. Shouldn't be a problem for most applications though. Is it really a hard limit on metal? No more than 16 total samplers is a little restrictive.
2x4 for channel selection, 2x4 for lookup override
Each channel can either be routed to one of 4 source channels, or be overridden to 0 or 1. So there are 6 possibilities for each, and the total variants to encode is 6 ^ 4, which is less than 11 bits.
decoding made 1440p and higher resolutions out of reach for some applications when using mid-range GPUs
Wow, that's quite a surprise.
The best I can do here is refactor some functionality and introduce a compile-time switch to enable use of the software remap path
Yes, that's what I thought we should do as well.
I have seen games use 14+ units before so its not completely useless either
Well, technically, you don't need that many samplers. The sampler space is fairly limited. Applications just need to re-use/share the samplers more.
Is it really a hard limit on metal? No more than 16 total samplers is a little restrictive.
Unfortunately, yes. You can see it in the Metal feature tables: "Maximum number of entries in the sampler state argument table, per graphics or compute function" == 16 on all feature levels.
So the caveat is that the limit for the active bindings per stage. We assign the bindings at the pipeline layout level (for efficiency, and to match Vulkan semantics), but strictly speaking a portability library can do something more sophisticated.
Rpcs3 allocates 16 samplers for FS stage and 4 for VS stage. So I think the resource limitation is non-existent if the limits are only per-stage. As for remap bits transport - its already done for debugging purposes, just need to hook up a decoder in the FS stage. I'll have something working in a few days.
@kd-11
Rpcs3 allocates 16 samplers for FS stage and 4 for VS stage.
Hmm, this is good news. We should make it working then. Our failing check is for per-stage samplers. Could it be that you aren't communicating to us (at the descriptor set layout level) the stage visibility properly?
As for remap bits transport - its already done for debugging purposes, just need to hook up a decoder in the FS stage. I'll have something working in a few days.
This is amazing, looking forward to it!
You can see the layout creation here: https://github.com/RPCS3/rpcs3/blob/master/rpcs3/Emu/RSX/VK/VKGSRender.cpp#L444-L460
There are 20 total samplers, 4 vertex, 16 fragment, I think your check is checking the total against the samplers_per_stage instead of max(vertex_stage.sampler_count, fragment_stage.sampler_count, ...)?
@kd-11 thanks for the link! Your code looks good. I'll investigate later tonight. Should be an easy fix either way ;)
@kd-11 it's great that this issue allowed us to discover that we weren't properly mapping the image usage flags in gfx-portability, so we ended up writing and binding all resources for all stages :sweat_smile: Fixes are coming. Biggest roadblock on our side right now is missing occlusion queries.
I think its time to close this one, RPCS3 technically builds and runs on mac OS. Crashes when running on base moltenVK but it does run with gfx-rs. Closing as fixed via https://github.com/RPCS3/rpcs3/pull/4996
Most helpful comment
I'm working on getting the Vulkan Portability support for RPCS3.
Currently it builds successfully but the swapchain implementation is missing.