Xbox pixel shaders are quite simple - there's only 4 opcodes, no conditionals, only 8 stages and one final combiner, which makes it an ideal target for a so-called 'Uber shader', specifically for the Xbox.
A prerequisite for this is Direct3D 9, as that offers us the possibility to write this shader in either HLSL or Cg (if we want to target OpenGL with this too).
This 'Uber shader' ... :
One closely related aspect for this, is how pixels are read; The following can be moved into it's own issue later on, but :
Any Xbox texture format that isn't natively supported on host Direct3D, could be read using the same shader too;
Currently, unsupported texture formats are converted when host resources are created to mimick Xbox resources. Instead of doing a conversion on the host, the texture data could be fed directly into the Uber shader, which could interpret unsupported pixel formats using a format-specific pixel reader function.
Complete details about Ubershader can be found at https://dolphin-emu.org/blog/2017/07/30/ubershaders/ including the final result major improvement from their emulator.
At least their license is compatible with ours. Perhaps graphic dev team can bring their working code in Cxbx-Reloaded with some changes require for xbox shader support.
@RadWolfie it's more than just changes. Dolphin's ubershaders are very different than what we would need. The concept is similar (a big shader implementing all scenarios) but the Xbox pixel combiner is much more simple than what the GameCube has, and it's not just 'simple' changes: The design of the whole shader reflects how the hardware works.
I can see this in future as addon, but if you ask me you should focus more on getting CxBx to work on all games first.
Meaning, clean issues and rest, then clean code etc.
This is a long process already and focusing on this kind of tool would take away to much time instead on focusing on real problems that CxBx has, even on "games that work".
I can see this in future as addon, but if you ask me you should focus more on getting CxBx to work on all games first.
It is tagged as an enhancement for this reason, though it might also benefit compatibility (texture conversion) so I wouldn't immediately dismiss it.
The existing LLE pixel shader conversion code can be used as a starting point : https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/blob/develop/src/devices/video/nv2a_psh.cpp (start reading in psh_convert)
Note : The above link is an older copy/port of xqemu's original code (which we'd like to submodule at some moment in time) which can be found here : https://github.com/xqemu/xqemu/blob/master/hw/xbox/nv2a/nv2a_psh.c)
But instead of generating a shader based on one given input set, this "Uber shader" must contain paths for all possible inputs, so "one shader can rule them all".
Patrick, can this shader code be ported with little modifications into XQEMU if their programmers wish to? And when your people finish this, would it be able to emulate both HLE and LLE GPU (graphics) in one go?
Yes and yes
This "Uber shader" is probably not going to be written by hand, but generated thru code.
Such a generator could be to instructed to generate either GLSL (for OpenGL), HLSL (for Direct3D) or even SPIR-V assembly (for Vulkan, if we ever get to implementing such a rendering backend).
Note, that GLSL shaders are similair to HLSL and can be ported from one to the other; Here some docs on that https://docs.microsoft.com/en-us/windows/uwp/gaming/glsl-to-hlsl-reference
Also, GLSL and HLSL can be converted to SPRI-V; Here some docs on that : https://vulkan.lunarg.com/doc/sdk/1.1.92.1/windows/spirv_toolchain.html and https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/SPIR-V.rst
I've been looking into writing an uber-shader in HLSL, but hit a 'slight' roadblock :
The Xbox register combiner is stored in DWORD's, which contain bit patterns to steer their functionality.
Our Uber-shader should be able to interpret individual bits, which means bitwise operations have to take place in the HLSL shader.
Now, since we're currently using Direct3D 9, the highest shader model we can use, is 3.0, which doesn't offer native integer support (that was introduced with Direct3D 10, shader model 4.0)
Under DIrect3D9, shader model 3, integers are internally treated as floats (and more specifically, their floor() result).
That means not all 32 bit patterns can be reliably be read from those floats, thus, when rounding issues arise, our uber shader would be reading other bits than were intended.
Conclusion : Uber shaders will probably require shader model 4, thus Direct3D 10...
( For reference, here part of the Uber-shader generation code in Dolphin : https://github.com/dolphin-emu/dolphin/blob/154eeae8ae1e7105bc06e76dc9ff09884190867e/Source/Core/VideoCommon/UberShaderPixel.cpp , which AFAICT is targetting shader model 4 or higher?)
For anyone interested, here a dump of some code I hacked together - which won't work :
// Xbox Pixel Shader definition (a dword array) buffer
extern const Buffer<DWORD> g_XboxPSDef;
// Xbox Pixel Shader definition (a dword array) member indices :
static const dword X_D3DRS_PSALPHAINPUTS0 = 0u;
static const dword X_D3DRS_PSALPHAINPUTS1 = 1u;
static const dword X_D3DRS_PSALPHAINPUTS2 = 2u;
static const dword X_D3DRS_PSALPHAINPUTS3 = 3u;
static const dword X_D3DRS_PSALPHAINPUTS4 = 4u;
static const dword X_D3DRS_PSALPHAINPUTS5 = 5u;
static const dword X_D3DRS_PSALPHAINPUTS6 = 6u;
static const dword X_D3DRS_PSALPHAINPUTS7 = 7u;
static const dword X_D3DRS_PSFINALCOMBINERINPUTSABCD = 8u;
static const dword X_D3DRS_PSFINALCOMBINERINPUTSEFG = 9u;
static const dword X_D3DRS_PSCONSTANT0_0 = 10u;
static const dword X_D3DRS_PSCONSTANT0_1 = 11u;
static const dword X_D3DRS_PSCONSTANT0_2 = 12u;
static const dword X_D3DRS_PSCONSTANT0_3 = 13u;
static const dword X_D3DRS_PSCONSTANT0_4 = 14u;
static const dword X_D3DRS_PSCONSTANT0_5 = 15u;
static const dword X_D3DRS_PSCONSTANT0_6 = 16u;
static const dword X_D3DRS_PSCONSTANT0_7 = 17u;
static const dword X_D3DRS_PSCONSTANT1_0 = 18u;
static const dword X_D3DRS_PSCONSTANT1_1 = 19u;
static const dword X_D3DRS_PSCONSTANT1_2 = 20u;
static const dword X_D3DRS_PSCONSTANT1_3 = 21u;
static const dword X_D3DRS_PSCONSTANT1_4 = 22u;
static const dword X_D3DRS_PSCONSTANT1_5 = 23u;
static const dword X_D3DRS_PSCONSTANT1_6 = 24u;
static const dword X_D3DRS_PSCONSTANT1_7 = 25u;
static const dword X_D3DRS_PSALPHAOUTPUTS0 = 26u;
static const dword X_D3DRS_PSALPHAOUTPUTS1 = 27u;
static const dword X_D3DRS_PSALPHAOUTPUTS2 = 28u;
static const dword X_D3DRS_PSALPHAOUTPUTS3 = 29u;
static const dword X_D3DRS_PSALPHAOUTPUTS4 = 30u;
static const dword X_D3DRS_PSALPHAOUTPUTS5 = 31u;
static const dword X_D3DRS_PSALPHAOUTPUTS6 = 32u;
static const dword X_D3DRS_PSALPHAOUTPUTS7 = 33u;
static const dword X_D3DRS_PSRGBINPUTS0 = 34u;
static const dword X_D3DRS_PSRGBINPUTS1 = 35u;
static const dword X_D3DRS_PSRGBINPUTS2 = 36u;
static const dword X_D3DRS_PSRGBINPUTS3 = 37u;
static const dword X_D3DRS_PSRGBINPUTS4 = 38u;
static const dword X_D3DRS_PSRGBINPUTS5 = 39u;
static const dword X_D3DRS_PSRGBINPUTS6 = 40u;
static const dword X_D3DRS_PSRGBINPUTS7 = 41u;
static const dword X_D3DRS_PSCOMPAREMODE = 42u;
static const dword X_D3DRS_PSFINALCOMBINERCONSTANT0 = 43u;
static const dword X_D3DRS_PSFINALCOMBINERCONSTANT1 = 44u;
static const dword X_D3DRS_PSRGBOUTPUTS0 = 45u;
static const dword X_D3DRS_PSRGBOUTPUTS1 = 46u;
static const dword X_D3DRS_PSRGBOUTPUTS2 = 47u;
static const dword X_D3DRS_PSRGBOUTPUTS3 = 48u;
static const dword X_D3DRS_PSRGBOUTPUTS4 = 49u;
static const dword X_D3DRS_PSRGBOUTPUTS5 = 50u;
static const dword X_D3DRS_PSRGBOUTPUTS6 = 51u;
static const dword X_D3DRS_PSRGBOUTPUTS7 = 52u;
static const dword X_D3DRS_PSCOMBINERCOUNT = 53u;
static const dword X_D3DRS_PS_RESERVED = 54u; // Dxbx note : This takes the slot of X_D3DPIXELSHADERDEF.PSTextureModesu; set by D3DDevice_SetRenderState_LogicOp?
static const dword X_D3DRS_PSDOTMAPPING = 55u;
static const dword X_D3DRS_PSINPUTTEXTURE = 56u;
void do_stage(DWORD input, DWORD output, bool is_alpha)
{
// TODO
}
float4 main() : COLOR
{
float4 Result = float4(1.0f, 1.0f, 1.0f, 1.0f);
int num_stages = g_XboxPSDef.Load(X_D3DRS_PSCOMBINERCOUNT) &0xff;
// assert(num_stages < 8);
for (int i = 0; i < num_stages; i++) {
// Stage i
DWORD rgb_input = g_XboxPSDef.Load(X_D3DRS_PSRGBINPUTS0 + i);
DWORD alpha_input = g_XboxPSDef.Load(X_D3DRS_PSALPHAINPUTS0 + i);
DWORD rgb_output = g_XboxPSDef.Load(X_D3DRS_PSRGBOUTPUTS0 + i);
DWORD alpha_output = g_XboxPSDef.Load(X_D3DRS_PSALPHAOUTPUTS0 + i);
do_stage(rgb_input, rgb_output, /*rgb*/false);
do_stage(alpha_input, alpha_output, /*is_alpha=*/true);
}
DWORD FINALCOMBINERINPUTSABCD = g_XboxPSDef.Load(X_D3DRS_PSFINALCOMBINERINPUTSABCD);
DWORD FINALCOMBINERINPUTSEFG = g_XboxPSDef.Load(X_D3DRS_PSFINALCOMBINERINPUTSEFG);
if (FINALCOMBINERINPUTSABCD || FINALCOMBINERINPUTSEFG) {
// final combiner stage
/* TODO : Convert this into something that might stand a chance of working :
ps->varE = get_input_var(ps, final.e, false);
ps->varF = get_input_var(ps, final.f, false);
QString *a = get_input_var(ps, final.a, false);
QString *b = get_input_var(ps, final.b, false);
QString *c = get_input_var(ps, final.c, false);
QString *d = get_input_var(ps, final.d, false);
QString *g = get_input_var(ps, final.g, true);
Result.rgb = float3(get(d)) + mix(float3(get(c)), float3(get(b)), float3(get(a)));
Result.a = get(g);
*/
}
return Result;
}
2019-07-30 : Here a trick to still be able to access individual bits in Shader Model 3 : http://theinstructionlimit.com/encoding-boolean-flags-into-a-float-in-hlsl !
Granted, this allows for at most 23 bits per float, but if we would split each DWORD into two separate DWORD's (one with the leftmost 16 input bits, and one with the rightmost 16 input bits) in a pre-processing step, this could actually work!
(It's using FMOD, which is supported from Shader Model 2 and up, so thus far, this seems feasible!)
2019-08-06 : Using the above approach, here's a somewhat-complete HLSL interpreter of the Xbox register combiner unit : https://github.com/PatrickvL/Cxbx-Reloaded/blob/HLSL_PS/src/core/hle/D3D8/Direct3D9/RegisterCombinerInterpreter.hlsl~~
2019-12-03 : https://github.com/PatrickvL/Cxbx-Reloaded/blob/HLSL_PS/src/core/hle/D3D8/RegisterCombinerInterpreter.fx
2019-08-17 Status of the above is, that all Xbox color and final combiner operations are interpreted in this shader. Also, the host code already transfers all Xbox pixel shader related render state values towards the shader through pixel shader constants (each dword is split in 4 bytes, which values are passed as a float to the a,r,g,b channels of each host pixel shader constant).
Still to do: Texture fetch must still be implemented, default values for all registers must still be reverse engineered, and output of the (emulated) vertex shader pipeline must somehow be connected to this HLSL pixel shader's input.
With that, debugging and fine-tuning can commence.
Debugging is required, since right now the resulting shader assembly appears to be missing some parts that are most certainly implemented in the above linked HLSL code
2019-08-23 After hacking, tweaking, debugging and fixing a few things, the above linked shader seems to correctly receive and decode Xbox pixel shader declarations. But since texture fetch and vertex shader input doesn't work yet, there's no output being generated yet.
Most helpful comment
2019-08-23 After hacking, tweaking, debugging and fixing a few things, the above linked shader seems to correctly receive and decode Xbox pixel shader declarations. But since texture fetch and vertex shader input doesn't work yet, there's no output being generated yet.