As I mentioned here, the constant speed up for OPCODE_SPLAT breaks many games's vertices and physics(Tested on Sonic06(wonky camera, death on jump pad) and Bayonetta(enemy spinning like fridge spinner)).
This change was in PR#514, comment out OPCODE_SPLAT makes many stuff working again. I haven't figured out why it causes problem, only found out by comparing logged values between Splat in value.cc and the original Splat in hir_builder.cc, the values have different ordering and amounts(but most part are the same).
Before(test video by EOGC):
https://www.youtube.com/watch?v=6AaForznZF8
After(with some other fixes on my xenia fork):
https://www.youtube.com/watch?v=Fb0TJT4f73k
Here's the log file for Sonic 06, I commented OPCODE_SPLAT out and only log the values.
splat_log.txt
It's logged as "!> 00000028 splat(cpp or hb) type: %d, value: %d"
You can searching by using keyword "splat".
This is a very nice catch! Awesome work.
I'm not too sure what would cause the difference between the JIT's splat implementation, and the constant implementation though.
On second thought, I want you to comment out OPCODE_DOT_PRODUCT_3 and OPCODE_DOT_PRODUCT_4 and see what happens.
@DrChat
With constant check "(i->src1.value->IsConstant() && i->src2.value->IsConstant())" these two opcodes in cpp almost got zero hit, only 1 OPCODE_DOT_PRODUCT_3 on Bayonetta's opening(and after this it didn't pass to hir_builder which is weird too).
Logging OPCODE_DOT_PRODUCT_3/4 without constant check in cpp and hir_builder can yield some data, with constant check it just got zero hit.
I tried the following combinations:
Oops I'm dumb, it's passing to x64_sequences like you mentioned, now the amounts and values are the same(see log file, only logged INT32_TYPE). I'll try to print out dest value to see what's difference.
xenia_splat2.txt
Yeah - could you do some investigation of the differences in output between the constant dot product and the JIT's dot product?
Splat doesn't seem to be the issue - and the only way to have a float splat is from dot product.
I may have been wrong to point out DP3/DP4.
If you've identified the specific case that causes the issue, set a breakpoint in that OPCODE_SPLAT case in the propagation pass, and walk the i->prev chain backwards until you hit an OPCODE_SOURCE_OFFSET, then report what you see here.
Also, if the values are not flagged constant then the constant data is junk.
Okay, here's the test report:
I set breakpoint on OPCODE_SPLAT's INT32_TYPE(since INT32_TYPE is what cause game breaks)
This is the first hit on Sonic06 with i->src1.value->constant.i32 = 0x00000001

There's a lot of 1b0, can't sure what it is.
Btw I can't go further because Xenia crashes on playing video file when running with VS2015(both debug/released)
Okay, that means it's a future instruction. Just to verify that this particular splat causes an issue, skip it in the VS debugger (by dragging the execution pointer to the break statement) and see if it fixes anything.
Instructions that use integer OPCODE_SPLAT (but not necessarily the culprit):
vspltb
vsplth
vspltw
vspltisb
vspltish
vspltisw
The specific sequence you landed on (but almost certainly not responsible) @ 0x825D2264 is:
.text:825D2260 lvx128 v12, r0, r4
.text:825D2264 vspltisw v13, 1
.text:825D2268 vmsum3fp128 v0, v12, v12
.text:825D226C vspltisw v11, 0
.text:825D2270 vcfsx v10, v13, 1
.text:825D2274 vrsqrtefp v13, v0
.text:825D2278 vmulfp128 v9, v0, v10
.text:825D227C vcmpeqfp v11, v0, v11
.text:825D2280 vmulfp128 v8, v13, v13
.text:825D2284 vnmsubfp v10, v9, v10, v8
.text:825D2288 vmaddfp v13, v13, v13, v10
.text:825D228C vmulfp128 v13, v12, v13
.text:825D2290 vsel v11, v13, v0, v11
.text:825D2294 stvx v11, r0, r3
.text:825D2298 blr
OPCODE_SPLAT: ON / OFF

@SakataGintokiYT Interesting. Appreciate the picture showing the impact!
This is most likely an x86 backend SSE instruction clobbering a constant input somehow.
@SakataGintokiYT @AllanCat Do you both have AVX2 enabled or disabled?
If it's enabled, could you specify --enable_haswell_instructions=false and test it again?
@DrChat this equally for ivy-bridge CPU's too.
No effect if use clear start or with key --enable_haswell_instructions=false
.
Resident Evil 5:
Opcode_SPLAT enable - character move impossible, camera move impossibe
Opcode_SPLAT disable - character move correct, camera move correct
Okay, cool.
@Parovozik, can you verify if 6c97dbaf81010e7345c5cd8119bbc663c274c5d4 affects this bug at all?
@DrChat i tried (https://github.com/benvanik/xenia/commit/6c97dbaf81010e7345c5cd8119bbc663c274c5d4). This without changes for me.
Naruto UNS3.log
Sonic.log
bayonetta.log



Okay - are there any games with a quick repro case?
Can you find a game that takes less than 1 minute from boot to see this issue happen? Preferably the shortest you can find.
Sonic 2006 (if skip movie and cut-scene)
Sonic 06 is a no-go. Blackscreens for me - I'll look into it later but I'm not interested in it now.
Banjo Kazooie Nuts and Bolts appears to be a repro case. Camera angles messed up in intro.
@DrChat You can quickly reproduce this problem in Naruto SUNS: Generations -Demo on the title screen.

@SakataGintokiYT Excellent - thanks.
28 total splats
SPLAT @@ 0x00000000823cc9f0 or 0x000001398dba8ec8
SPLAT @@ 0x00000000822da4a8 or <Unable to read memory>
SPLAT @@ 0x00000000823d7d4c or 0x00000000823d7d48
SPLAT @@ 0x00000000823d7d54 or 0x0000000000000068
SPLAT @@ 0x00000000823d7d58 or 0x00000000823d7d54
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d78
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d80
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d8c
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7d94
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7d9c
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7da4
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7dac
SPLAT @@ 0x000001398c864250 or 0x00000000823d7db4
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dbc
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dc4
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dcc
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dd4
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e04
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e0c
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e10
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e14
!! SPLAT @@ 0x00000000822db544 or 0x0000000000000340
!! SPLAT @@ 0x00000000822db54c or 0x0000000000000330
SPLAT @@ 0x00000000822db6d4 or 0x0000000000000360
SPLAT @@ 0x00000000822da89c or 0x00000000000007f0
SPLAT @@ 0x00000000822da8a4 or 0x0000000000000400
SPLAT @@ 0x00000000822da8fc or 0x000001398dba2fc8
SPLAT @@ 0x00000000822db190 or <Unable to read memory>
@@ 0x00000000822db544 and 0x00000000822db54c:
.text:822DB544 vspltisw v16, 1
.text:822DB54C vspltisw v0, -1
.text:822DB564 vslw **v14**, v16, v0
.text:822DB660 vcmpeqfp v27, v0, **v14**
.text:822DB6C4 vsel v5, v4, **v14**, v27
.text:822DB6D0 vandc128 v20, v72, **v14**
...
.text:822DB6F8 vslw128 v0, v16, v57
Offending function in full:
default fn 822DB540-822DB7B8
822DB540 loc_822DB540:
822DB540 165EF6D2 vor128 vr18, vr94, vr94
822DB544 1201038C vspltisw vr16, 0x1 # vr16 = CONSTANT
822DB548 163FFED2 vor128 vr17, vr95, vr95
822DB54C 101F038C vspltisw vr0, 0x1F # vr0 = CONSTANT
822DB550 15FDEED2 vor128 vr15, vr93, vr93
822DB554 167CE6D2 vor128 vr19, vr92, vr92
822DB558 1BA89390 vpermwi128 vr29, vr18, 0xC8
822DB55C 10A0834A vcfsx vr5, vr16, 0x0
822DB560 1B908B50 vpermwi128 vr28, vr17, 0xB0
822DB564 11D00184 vslw vr14, vr16, vr0 # vr14 = CONSTANT
822DB568 1AF49350 vpermwi128 vr23, vr18, 0xB4
822DB56C 1BF88B94 vpermwi128 vr63, vr17, 0xD8
822DB570 1BD092D4 vpermwi128 vr62, vr18, 0x70
822DB574 1BA49314 vpermwi128 vr61, vr18, 0x84
822DB578 149DE090 vmulfp128 vr4, vr29, vr28
822DB57C 1B848B94 vpermwi128 vr60, vr17, 0xC4
822DB580 1B608AD4 vpermwi128 vr59, vr17, 0x60
822DB584 1477F891 vmulfp128 vr3, vr23, vr63
822DB588 1B488B94 vpermwi128 vr58, vr17, 0xC8
822DB58C 1B249394 vpermwi128 vr57, vr18, 0xC4
822DB590 145EE0B1 vmulfp128 vr2, vr62, vr60
822DB594 1B0092D4 vpermwi128 vr56, vr18, 0x60
822DB598 143DD8B1 vmulfp128 vr1, vr61, vr59
822DB59C 1AF08AD4 vpermwi128 vr55, vr17, 0x70
822DB5A0 1AC48B14 vpermwi128 vr54, vr17, 0x84
822DB5A4 15BAD2F1 vor128 vr13, vr58, vr58
822DB5A8 1B309350 vpermwi128 vr25, vr18, 0xB0
822DB5AC 1539CAF1 vor128 vr9, vr57, vr57
822DB5B0 1BD89390 vpermwi128 vr30, vr18, 0xD8
822DB5B4 14F8C2F1 vor128 vr7, vr56, vr56
822DB5B8 1B748B50 vpermwi128 vr27, vr17, 0xB4
822DB5BC 1517BAF1 vor128 vr8, vr55, vr55
822DB5C0 14D6B2F1 vor128 vr6, vr54, vr54
822DB5C4 1AAC7AD4 vpermwi128 vr53, vr15, 0x6C
822DB5C8 1099236F vnmsubfp vr4, vr25, vr13, vr4
822DB5CC 1A8C7A54 vpermwi128 vr52, vr15, 0x2C
822DB5D0 1A7C7A14 vpermwi128 vr51, vr15, 0x1C
822DB5D4 101E1EEF vnmsubfp vr0, vr30, vr27, vr3
822DB5D8 1BF89B90 vpermwi128 vr31, vr19, 0xD8
822DB5DC 16DED890 vmulfp128 vr22, vr30, vr27
822DB5E0 1A549B54 vpermwi128 vr50, vr19, 0xB4
822DB5E4 1060038C vspltisw vr3, 0x0
822DB5E8 11A9122F vnmsubfp vr13, vr9, vr8, vr2
822DB5EC 1A089B94 vpermwi128 vr48, vr19, 0xC8
822DB5F0 118709AF vnmsubfp vr12, vr7, vr6, vr1
822DB5F4 18309B50 vpermwi128 vr1, vr19, 0xB0
822DB5F8 16BFD890 vmulfp128 vr21, vr31, vr27
822DB5FC 10D7BC84 vor vr6, vr23, vr23
822DB600 1712F0B0 vmulfp128 vr24, vr50, vr30
822DB604 1A249B94 vpermwi128 vr49, vr19, 0xC4
822DB608 1690C8B0 vmulfp128 vr20, vr48, vr25
822DB60C 19F09AD4 vpermwi128 vr47, vr19, 0x70
822DB610 19C09AD4 vpermwi128 vr46, vr19, 0x60
822DB614 1741D091 vmulfp128 vr26, vr1, vr58
822DB618 19A49B14 vpermwi128 vr45, vr19, 0x84
822DB61C 1739D091 vmulfp128 vr25, vr25, vr58
822DB620 159421B4 vmsum3fp128 vr44, vr52, vr4
822DB624 145FFAF1 vor128 vr2, vr63, vr63
822DB628 157501B4 vmsum3fp128 vr43, vr53, vr0
822DB62C 155369B4 vmsum3fp128 vr42, vr51, vr13
822DB630 152F6194 vmsum3fp128 vr41, vr15, vr12
822DB634 180B5321 vmrghw128 vr0, vr43, vr42
822DB638 19AC4B21 vmrghw128 vr13, vr44, vr41
822DB63C 1000688C vmrghw vr0, vr0, vr13
822DB640 151301D4 vmsum4fp128 vr40, vr19, vr0
822DB644 18004631 vrefp128 vr0, vr40
822DB648 154842F1 vor128 vr10, vr40, vr40
822DB64C 152842F1 vor128 vr9, vr40, vr40
822DB650 1BC81820 vcmpeqfp128 vr30, vr40, vr3
822DB654 147292F1 vor128 vr3, vr50, vr50
822DB658 11002AAF vnmsubfp vr8, vr0, vr10, vr5
822DB65C 11800484 vor vr12, vr0, vr0
822DB660 136070C6 vcmpeqfp vr27, vr0, vr14
822DB664 115CE484 vor vr10, vr28, vr28
822DB668 1000022E vmaddfp vr0, vr0, vr8, vr0
822DB66C 11002A6F vnmsubfp vr8, vr0, vr9, vr5
822DB670 113FFC84 vor vr9, vr31, vr31
822DB674 14BFFAF1 vor128 vr5, vr63, vr63
822DB678 13FDEC84 vor vr31, vr29, vr29
822DB67C 12E9C5EF vnmsubfp vr23, vr9, vr23, vr24
822DB680 1080022E vmaddfp vr4, vr0, vr8, vr0
822DB684 111DEC84 vor vr8, vr29, vr29
822DB688 12C6B16F vnmsubfp vr22, vr6, vr5, vr22
822DB68C 101CE484 vor vr0, vr28, vr28
822DB690 178FC8B1 vmulfp128 vr28, vr47, vr57
822DB694 15B082F1 vor128 vr13, vr48, vr48
822DB698 17B9B8B1 vmulfp128 vr29, vr57, vr55
822DB69C 12A3A8AF vnmsubfp vr21, vr3, vr2, vr21
822DB6A0 14FEF2F1 vor128 vr7, vr62, vr62
822DB6A4 1048CAAF vnmsubfp vr2, vr8, vr10, vr25
822DB6A8 14D18AF1 vor128 vr6, vr49, vr49
822DB6AC 1301A7EF vnmsubfp vr24, vr1, vr31, vr20
822DB6B0 1BEC9AD4 vpermwi128 vr63, vr19, 0x6C
822DB6B4 106DD02F vnmsubfp vr3, vr13, vr0, vr26
822DB6B8 153CE2F1 vor128 vr9, vr60, vr60
822DB6BC 151EF2F1 vor128 vr8, vr62, vr62
822DB6C0 1B4C9A54 vpermwi128 vr58, vr19, 0x2C
822DB6C4 10A476EA vsel vr5, vr4, vr14, vr27 !! CULPRIT
822DB6C8 1775B9B0 vmsum3fp128 vr27, vr53, vr23
822DB6CC 1B5C9A10 vpermwi128 vr26, vr19, 0x1C
822DB6D0 16887270 vandc128 vr20, vr40, vr14
822DB6D4 1B370774 vspltisw128 vr57, 0x17
822DB6D8 17F1B8B1 vmulfp128 vr31, vr49, vr55
822DB6DC 142DC0B1 vmulfp128 vr1, vr45, vr56
822DB6E0 173FB1B0 vmsum3fp128 vr25, vr63, vr22
822DB6E4 1086E1EF vnmsubfp vr4, vr6, vr7, vr28
822DB6E8 14CE72F1 vor128 vr6, vr46, vr46
822DB6EC 10E8EA6F vnmsubfp vr7, vr8, vr9, vr29
822DB6F0 13A567AA vsel vr29, vr5, vr12, vr30
822DB6F4 16F5A9B0 vmsum3fp128 vr23, vr53, vr21
822DB6F8 1810C8D1 vslw128 vr0, vr16, vr57
822DB6FC 17DA11B0 vmsum3fp128 vr30, vr58, vr2
822DB700 1458B0B1 vmulfp128 vr2, vr56, vr54
822DB704 1594C1B0 vmsum3fp128 vr12, vr52, vr24
822DB708 14BCE2F1 vor128 vr5, vr60, vr60
822DB70C 151419B0 vmsum3fp128 vr8, vr52, vr3
822DB710 146EB0B1 vmulfp128 vr3, vr46, vr54
822DB714 1000A2C6 vcmpgtfp vr0, vr0, vr20
822DB718 179321B0 vmsum3fp128 vr28, vr51, vr4
822DB71C 148F7AF1 vor128 vr4, vr47, vr47
822DB720 1139D88C vmrghw vr9, vr25, vr27
822DB724 177A3990 vmsum3fp128 vr27, vr26, vr7
822DB728 14FDEAF1 vor128 vr7, vr61, vr61
822DB72C 13E4F96F vnmsubfp vr31, vr4, vr5, vr31
822DB730 1B4BBB20 vmrghw128 vr26, vr43, vr23
822DB734 119E608C vmrghw vr12, vr30, vr12
822DB738 10A609EF vnmsubfp vr5, vr6, vr7, vr1
822DB73C 190C4320 vmrghw128 vr8, vr44, vr8
822DB740 113A488C vmrghw vr9, vr26, vr9
822DB744 1188608C vmrghw vr12, vr8, vr12
822DB748 153D4890 vmulfp128 vr9, vr29, vr9
822DB74C 14D3F9B0 vmsum3fp128 vr6, vr51, vr31
822DB750 159D6090 vmulfp128 vr12, vr29, vr12
822DB754 109BE08C vmrghw vr4, vr27, vr28
822DB758 1109982A vsel vr8, vr9, vr19, vr0
822DB75C 153BDAF1 vor128 vr9, vr59, vr59
822DB760 114C782A vsel vr10, vr12, vr15, vr0
822DB764 178842D8 vor128 vr92, vr8, vr8
822DB768 151DEAF1 vor128 vr8, vr61, vr61
822DB76C 159BDAF1 vor128 vr12, vr59, vr59
822DB770 17AA52D8 vor128 vr93, vr10, vr10
822DB774 154D6AF1 vor128 vr10, vr45, vr45
822DB778 10E8126F vnmsubfp vr7, vr8, vr9, vr2
822DB77C 112A1B2F vnmsubfp vr9, vr10, vr12, vr3
822DB780 194A3320 vmrghw128 vr10, vr42, vr6
822DB784 114A208C vmrghw vr10, vr10, vr4
822DB788 155D5090 vmulfp128 vr10, vr29, vr10
822DB78C 15133990 vmsum3fp128 vr8, vr19, vr7
822DB790 158F4990 vmsum3fp128 vr12, vr15, vr9
822DB794 152F2990 vmsum3fp128 vr9, vr15, vr5
822DB798 19896320 vmrghw128 vr12, vr41, vr12
822DB79C 1108488C vmrghw vr8, vr8, vr9
822DB7A0 112A902A vsel vr9, vr10, vr18, vr0
822DB7A4 118C408C vmrghw vr12, vr12, vr8
822DB7A8 17C94AD8 vor128 vr94, vr9, vr9
822DB7AC 159D6090 vmulfp128 vr12, vr29, vr12
822DB7B0 100C882A vsel vr0, vr12, vr17, vr0
822DB7B4 17E002D8 vor128 vr95, vr0, vr0
822DB7B8 4E800020 bclr 20, 0
Found the culprit. Going to test, and will release in a bit.
Okay guys, can you confirm that this is fixed as of 3a8f8f2ecb2f28aaa82543ad8cf6ec3ff81bac94?
[3a8f8f2]

Its fine now




Awesome! Appreciate the help guys.
Most helpful comment
Awesome! Appreciate the help guys.