Boom Boom Rocket sets up export.
Lumines Live
This is going to be difficult - as memexport writes straight to main memory.
My guess is (at least some) games use memexport as transform feedback.
...
/* 84 */ mad eA, r0.xxxx, c5.xyxx, c255
/* 85 */ max eM0, r1.xxxx, r1.xxxx
/* 86 */ max eM1, r1.yyyy, r1.yyyy
... up to eM4
eA.x = main memory address >> 2 (uint32 aliased as float)
eA.yzw = ???
Writes to eA are restricted to mad only. Perhaps they're using some weird tricks?
Just to keep this info somewhere in a more persistent place than Discord:
The constant multiplicand, according to the usage in Halo 3 and Banjo-Kazooie: Nuts & Bolts and to Advanced Screenspace Antialiasing, is (0.0, 1.0, 0.0, 0.0).
eA.x, as uint, is physical address in dwords | 0x40000000 β checked by comparing the register value and the index buffer pointer in draw calls using tessellation (the index buffer contains per-edge tessellation factors as float32 rather than indices in this case).
eA.y, appears to be offset in dwords | 0x4B000000 (unless there is some element stride and that's element offset, I still don't know that, but that's not very likely especially considering the slide from Advanced Screenspace Antialiasing calculates the offset from IntOffsets). Adding an integer converted to a float to 2.0^23 puts it in the low mantissa bits, that's how mad is used to write an integer using floating-point ALU.
eA.z is something unknown, possibly some flags. In the Halo 3 tessellation edge factor calculation shader, it's 0x4B07E4FA, in Banjo-Kazooie: Nuts & Bolts for the shader with the same purpose, it's 0x4B07E46A.
eA.w is buffer size in dwords | 0x4B000000.
It does also pack things, and apparently not only to 32 bits, but to larger vectors also. And in this case, the size in W is in vectors, not in dwords (and the offset in Y probably too) β in the shader from Halo 3 menu, the scale of W for the same buffer depends on the format.
Here are stream constant Z values from some shader from the menu of Halo 3:
What we can see about the bits:
alloc export = 1 or = 2 depends on some "tile alignment" according to https://forum.beyond3d.com/threads/geometry-shader-whats-the-difference-from-vs.24072/
According to the PDB from Call of Duty 4 alpha, Z of GPU_MEMEXPORT_STREAM_CONSTANT consists of:
Doesn't explain, however, why there is some data in bits 3:7 and 14:15 in the shaders for edge tessellation factor calculation in Halo 3 and Banjo-Kazooie: Nuts & Bolts, possibly leftovers on the stack.
This is still a meme xport
According to the PDB from Call of Duty 4 alpha, Z of GPU_MEMEXPORT_STREAM_CONSTANT consists of:
- 0:2 β GPUENDIAN128 EndianSwap
- 8:13 β GPUCOLORFORMAT Format
- 16:18 β GPUSURFACENUMBER NumericType (0 = UREPEAT, 1 = SREPEAT, 2 = UINTEGER, 3 = SINTEGER, 7 = FLOAT)
- 19 β GPUSURFACESWAP ComponentSwap (0 = LOW_RED, 1 = LOW_BLUE)
- 20:31 β 4B0
Doesn't explain, however, why there is some data in bits 3:7 and 14:15 in the shaders for edge tessellation factor calculation in Halo 3 and Banjo-Kazooie: Nuts & Bolts, possibly leftovers on the stack.
Differences between how the engines of both games use the 360 GPU resources to store and write data might be to blame for this behavior.
Differences between how the engines of both games use the 360 GPU resources to store and write data might be to blame for this behavior.
Only two bits were different between Halo 3 and BKN&B though, 0x4B07E4FA in Halo 3 and 0x4B07E46A in BKN&B. But this isn't relevant anymore, I think. The info we have seems to be enough for implementing.
I'm not, however, sure what happens if you export not all components of a single vector, and whether the destination formats can include formats smaller than 32bpp, but I've never seen that happening so far. Those would have to be taken into consideration because in this case loading of the existing value in the RWByteAddressBuffer, shifting and masking would have to be doneβ¦ ewwwβ¦
Mostly done in the Direct3D 12 backend. Only seen Halo: Reach (IIRC) exporting data in the 8_8_8_8_A format that we don't support yet. If sub-32bpp formats are ever encountered, two R16 and four R8 UAVs (because of Nvidia's 128 megatexels limit for buffers), or just the latter, will need to be added.
Most helpful comment
This is still a meme xport