@kd-11, @elad335 , I think this area can be optimized with custom mem copy.
Now we have mem check and after that mem copy. On very low level it is 2 runs over memory segment. But if we copy values without check (just remove memcmp) we can achieve the same result. We can assign "m_graphics_state" flag on the run with condition. On very low level it would be represented as 1 run over memory segment. This approach decreases amount of cycle bounds checks, and doesn't affect any actual data checks.
Maybe it's not worth it, but a tiny speedup can help the emulation.
Thanks, fixed by #7992, it also takes care stream_data_to_memory_swapped_u32 which makes it super fast now.
Most helpful comment
Thanks, fixed by #7992, it also takes care stream_data_to_memory_swapped_u32 which makes it super fast now.