Xenia: Constant Propagation OPCODE_SPLAT Appears to be Broken

Created on 26 Jul 2017  路  26Comments  路  Source: xenia-project/xenia

Issue

As I mentioned here, the constant speed up for OPCODE_SPLAT breaks many games's vertices and physics(Tested on Sonic06(wonky camera, death on jump pad) and Bayonetta(enemy spinning like fridge spinner)).

This change was in PR#514, comment out OPCODE_SPLAT makes many stuff working again. I haven't figured out why it causes problem, only found out by comparing logged values between Splat in value.cc and the original Splat in hir_builder.cc, the values have different ordering and amounts(but most part are the same).

Video

Before(test video by EOGC):
https://www.youtube.com/watch?v=6AaForznZF8
After(with some other fixes on my xenia fork):
https://www.youtube.com/watch?v=Fb0TJT4f73k

Log file

Here's the log file for Sonic 06, I commented OPCODE_SPLAT out and only log the values.
splat_log.txt
It's logged as "!> 00000028 splat(cpp or hb) type: %d, value: %d"
You can searching by using keyword "splat".

bug cpu

Most helpful comment

Awesome! Appreciate the help guys.

All 26 comments

This is a very nice catch! Awesome work.

I'm not too sure what would cause the difference between the JIT's splat implementation, and the constant implementation though.

On second thought, I want you to comment out OPCODE_DOT_PRODUCT_3 and OPCODE_DOT_PRODUCT_4 and see what happens.

@DrChat
With constant check "(i->src1.value->IsConstant() && i->src2.value->IsConstant())" these two opcodes in cpp almost got zero hit, only 1 OPCODE_DOT_PRODUCT_3 on Bayonetta's opening(and after this it didn't pass to hir_builder which is weird too).

Logging OPCODE_DOT_PRODUCT_3/4 without constant check in cpp and hir_builder can yield some data, with constant check it just got zero hit.

I tried the following combinations:

  1. Comment out OPCODE_DOT_PRODUCT_3 and OPCODE_DOT_PRODUCT_4 in cpp
    The games are still broken.
  2. Comment out OPCODE_DOT_PRODUCT_3, OPCODE_DOT_PRODUCT_4 and OPCODE_SPLAT in cpp
    Seems no changes compared to only comment out OPCODE_SPLAT.

Oops I'm dumb, it's passing to x64_sequences like you mentioned, now the amounts and values are the same(see log file, only logged INT32_TYPE). I'll try to print out dest value to see what's difference.
xenia_splat2.txt

Yeah - could you do some investigation of the differences in output between the constant dot product and the JIT's dot product?

Splat doesn't seem to be the issue - and the only way to have a float splat is from dot product.

I may have been wrong to point out DP3/DP4.

If you've identified the specific case that causes the issue, set a breakpoint in that OPCODE_SPLAT case in the propagation pass, and walk the i->prev chain backwards until you hit an OPCODE_SOURCE_OFFSET, then report what you see here.

Also, if the values are not flagged constant then the constant data is junk.

Okay, here's the test report:
I set breakpoint on OPCODE_SPLAT's INT32_TYPE(since INT32_TYPE is what cause game breaks)
This is the first hit on Sonic06 with i->src1.value->constant.i32 = 0x00000001
splat
There's a lot of 1b0, can't sure what it is.

Btw I can't go further because Xenia crashes on playing video file when running with VS2015(both debug/released)

Okay, that means it's a future instruction. Just to verify that this particular splat causes an issue, skip it in the VS debugger (by dragging the execution pointer to the break statement) and see if it fixes anything.

Instructions that use integer OPCODE_SPLAT (but not necessarily the culprit):

vspltb
vsplth
vspltw
vspltisb
vspltish
vspltisw

The specific sequence you landed on (but almost certainly not responsible) @ 0x825D2264 is:

.text:825D2260                 lvx128    v12, r0, r4
.text:825D2264                 vspltisw  v13, 1
.text:825D2268                 vmsum3fp128 v0, v12, v12
.text:825D226C                 vspltisw  v11, 0
.text:825D2270                 vcfsx     v10, v13, 1
.text:825D2274                 vrsqrtefp v13, v0
.text:825D2278                 vmulfp128 v9, v0, v10
.text:825D227C                 vcmpeqfp  v11, v0, v11
.text:825D2280                 vmulfp128 v8, v13, v13
.text:825D2284                 vnmsubfp  v10, v9, v10, v8
.text:825D2288                 vmaddfp   v13, v13, v13, v10
.text:825D228C                 vmulfp128 v13, v12, v13
.text:825D2290                 vsel      v11, v13, v0, v11
.text:825D2294                 stvx      v11, r0, r3
.text:825D2298                 blr

OPCODE_SPLAT: ON / OFF
n

@SakataGintokiYT Interesting. Appreciate the picture showing the impact!
This is most likely an x86 backend SSE instruction clobbering a constant input somehow.

@SakataGintokiYT @AllanCat Do you both have AVX2 enabled or disabled?
If it's enabled, could you specify --enable_haswell_instructions=false and test it again?

@DrChat this equally for ivy-bridge CPU's too.
No effect if use clear start or with key --enable_haswell_instructions=false
.
Resident Evil 5:
Opcode_SPLAT enable - character move impossible, camera move impossibe
Opcode_SPLAT disable - character move correct, camera move correct

Okay, cool.
@Parovozik, can you verify if 6c97dbaf81010e7345c5cd8119bbc663c274c5d4 affects this bug at all?

@DrChat i tried (https://github.com/benvanik/xenia/commit/6c97dbaf81010e7345c5cd8119bbc663c274c5d4). This without changes for me.

Logs:

Naruto UNS3.log
Sonic.log
bayonetta.log

Screenshots:

bandicam 2018-03-02 21-51-48-075
bandicam 2018-03-02 21-44-37-702
bandicam 2018-03-02 21-48-47-114

Okay - are there any games with a quick repro case?
Can you find a game that takes less than 1 minute from boot to see this issue happen? Preferably the shortest you can find.

Sonic 2006 (if skip movie and cut-scene)

Sonic 06 is a no-go. Blackscreens for me - I'll look into it later but I'm not interested in it now.

Banjo Kazooie Nuts and Bolts appears to be a repro case. Camera angles messed up in intro.

@DrChat You can quickly reproduce this problem in Naruto SUNS: Generations -Demo on the title screen.
bez tytulu

@SakataGintokiYT Excellent - thanks.

Occurrences of splats

Naturo Storm: Generations

  • The offending splat occurs during the loading screen, before anything is drawn.
28 total splats

SPLAT @@ 0x00000000823cc9f0 or 0x000001398dba8ec8
SPLAT @@ 0x00000000822da4a8 or <Unable to read memory>
SPLAT @@ 0x00000000823d7d4c or 0x00000000823d7d48
SPLAT @@ 0x00000000823d7d54 or 0x0000000000000068
SPLAT @@ 0x00000000823d7d58 or 0x00000000823d7d54
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d78
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d80
SPLAT @@ 0x000001398c864250 or 0x00000000823d7d8c
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7d94
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7d9c
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7da4
SPLAT @@ 0x000001398c865ee8 or 0x00000000823d7dac
SPLAT @@ 0x000001398c864250 or 0x00000000823d7db4
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dbc
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dc4
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dcc
SPLAT @@ 0x000001398c868f60 or 0x00000000823d7dd4
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e04
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e0c
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e10
SPLAT @@ 0x000001398c86daf0 or 0x00000000823d7e14
!! SPLAT @@ 0x00000000822db544 or 0x0000000000000340
!! SPLAT @@ 0x00000000822db54c or 0x0000000000000330
SPLAT @@ 0x00000000822db6d4 or 0x0000000000000360
SPLAT @@ 0x00000000822da89c or 0x00000000000007f0
SPLAT @@ 0x00000000822da8a4 or 0x0000000000000400
SPLAT @@ 0x00000000822da8fc or 0x000001398dba2fc8
SPLAT @@ 0x00000000822db190 or <Unable to read memory>

@@ 0x00000000822db544 and 0x00000000822db54c:

  • Negative shift
.text:822DB544                 vspltisw  v16, 1
.text:822DB54C                 vspltisw  v0, -1
.text:822DB564                 vslw      **v14**, v16, v0
.text:822DB660                 vcmpeqfp  v27, v0, **v14**
.text:822DB6C4                 vsel      v5, v4, **v14**, v27
.text:822DB6D0                 vandc128  v20, v72, **v14**

...
.text:822DB6F8                 vslw128   v0, v16, v57

Offending function in full:

default fn 822DB540-822DB7B8 
822DB540          loc_822DB540:
822DB540 165EF6D2   vor128        vr18, vr94, vr94
822DB544 1201038C   vspltisw      vr16, 0x1            # vr16 = CONSTANT
822DB548 163FFED2   vor128        vr17, vr95, vr95
822DB54C 101F038C   vspltisw      vr0, 0x1F            # vr0  = CONSTANT
822DB550 15FDEED2   vor128        vr15, vr93, vr93
822DB554 167CE6D2   vor128        vr19, vr92, vr92
822DB558 1BA89390   vpermwi128    vr29, vr18, 0xC8
822DB55C 10A0834A   vcfsx         vr5, vr16, 0x0
822DB560 1B908B50   vpermwi128    vr28, vr17, 0xB0
822DB564 11D00184   vslw          vr14, vr16, vr0      # vr14 = CONSTANT
822DB568 1AF49350   vpermwi128    vr23, vr18, 0xB4
822DB56C 1BF88B94   vpermwi128    vr63, vr17, 0xD8
822DB570 1BD092D4   vpermwi128    vr62, vr18, 0x70
822DB574 1BA49314   vpermwi128    vr61, vr18, 0x84
822DB578 149DE090   vmulfp128     vr4, vr29, vr28
822DB57C 1B848B94   vpermwi128    vr60, vr17, 0xC4
822DB580 1B608AD4   vpermwi128    vr59, vr17, 0x60
822DB584 1477F891   vmulfp128     vr3, vr23, vr63
822DB588 1B488B94   vpermwi128    vr58, vr17, 0xC8
822DB58C 1B249394   vpermwi128    vr57, vr18, 0xC4
822DB590 145EE0B1   vmulfp128     vr2, vr62, vr60
822DB594 1B0092D4   vpermwi128    vr56, vr18, 0x60
822DB598 143DD8B1   vmulfp128     vr1, vr61, vr59
822DB59C 1AF08AD4   vpermwi128    vr55, vr17, 0x70
822DB5A0 1AC48B14   vpermwi128    vr54, vr17, 0x84
822DB5A4 15BAD2F1   vor128        vr13, vr58, vr58
822DB5A8 1B309350   vpermwi128    vr25, vr18, 0xB0
822DB5AC 1539CAF1   vor128        vr9, vr57, vr57
822DB5B0 1BD89390   vpermwi128    vr30, vr18, 0xD8
822DB5B4 14F8C2F1   vor128        vr7, vr56, vr56
822DB5B8 1B748B50   vpermwi128    vr27, vr17, 0xB4
822DB5BC 1517BAF1   vor128        vr8, vr55, vr55
822DB5C0 14D6B2F1   vor128        vr6, vr54, vr54
822DB5C4 1AAC7AD4   vpermwi128    vr53, vr15, 0x6C
822DB5C8 1099236F   vnmsubfp      vr4, vr25, vr13, vr4
822DB5CC 1A8C7A54   vpermwi128    vr52, vr15, 0x2C
822DB5D0 1A7C7A14   vpermwi128    vr51, vr15, 0x1C
822DB5D4 101E1EEF   vnmsubfp      vr0, vr30, vr27, vr3
822DB5D8 1BF89B90   vpermwi128    vr31, vr19, 0xD8
822DB5DC 16DED890   vmulfp128     vr22, vr30, vr27
822DB5E0 1A549B54   vpermwi128    vr50, vr19, 0xB4
822DB5E4 1060038C   vspltisw      vr3, 0x0
822DB5E8 11A9122F   vnmsubfp      vr13, vr9, vr8, vr2
822DB5EC 1A089B94   vpermwi128    vr48, vr19, 0xC8
822DB5F0 118709AF   vnmsubfp      vr12, vr7, vr6, vr1
822DB5F4 18309B50   vpermwi128    vr1, vr19, 0xB0
822DB5F8 16BFD890   vmulfp128     vr21, vr31, vr27
822DB5FC 10D7BC84   vor           vr6, vr23, vr23
822DB600 1712F0B0   vmulfp128     vr24, vr50, vr30
822DB604 1A249B94   vpermwi128    vr49, vr19, 0xC4
822DB608 1690C8B0   vmulfp128     vr20, vr48, vr25
822DB60C 19F09AD4   vpermwi128    vr47, vr19, 0x70
822DB610 19C09AD4   vpermwi128    vr46, vr19, 0x60
822DB614 1741D091   vmulfp128     vr26, vr1, vr58
822DB618 19A49B14   vpermwi128    vr45, vr19, 0x84
822DB61C 1739D091   vmulfp128     vr25, vr25, vr58
822DB620 159421B4   vmsum3fp128   vr44, vr52, vr4
822DB624 145FFAF1   vor128        vr2, vr63, vr63
822DB628 157501B4   vmsum3fp128   vr43, vr53, vr0
822DB62C 155369B4   vmsum3fp128   vr42, vr51, vr13
822DB630 152F6194   vmsum3fp128   vr41, vr15, vr12
822DB634 180B5321   vmrghw128     vr0, vr43, vr42
822DB638 19AC4B21   vmrghw128     vr13, vr44, vr41
822DB63C 1000688C   vmrghw        vr0, vr0, vr13
822DB640 151301D4   vmsum4fp128   vr40, vr19, vr0
822DB644 18004631   vrefp128      vr0, vr40
822DB648 154842F1   vor128        vr10, vr40, vr40
822DB64C 152842F1   vor128        vr9, vr40, vr40
822DB650 1BC81820   vcmpeqfp128   vr30, vr40, vr3
822DB654 147292F1   vor128        vr3, vr50, vr50
822DB658 11002AAF   vnmsubfp      vr8, vr0, vr10, vr5
822DB65C 11800484   vor           vr12, vr0, vr0
822DB660 136070C6   vcmpeqfp      vr27, vr0, vr14
822DB664 115CE484   vor           vr10, vr28, vr28
822DB668 1000022E   vmaddfp       vr0, vr0, vr8, vr0
822DB66C 11002A6F   vnmsubfp      vr8, vr0, vr9, vr5
822DB670 113FFC84   vor           vr9, vr31, vr31
822DB674 14BFFAF1   vor128        vr5, vr63, vr63
822DB678 13FDEC84   vor           vr31, vr29, vr29
822DB67C 12E9C5EF   vnmsubfp      vr23, vr9, vr23, vr24
822DB680 1080022E   vmaddfp       vr4, vr0, vr8, vr0
822DB684 111DEC84   vor           vr8, vr29, vr29
822DB688 12C6B16F   vnmsubfp      vr22, vr6, vr5, vr22
822DB68C 101CE484   vor           vr0, vr28, vr28
822DB690 178FC8B1   vmulfp128     vr28, vr47, vr57
822DB694 15B082F1   vor128        vr13, vr48, vr48
822DB698 17B9B8B1   vmulfp128     vr29, vr57, vr55
822DB69C 12A3A8AF   vnmsubfp      vr21, vr3, vr2, vr21
822DB6A0 14FEF2F1   vor128        vr7, vr62, vr62
822DB6A4 1048CAAF   vnmsubfp      vr2, vr8, vr10, vr25
822DB6A8 14D18AF1   vor128        vr6, vr49, vr49
822DB6AC 1301A7EF   vnmsubfp      vr24, vr1, vr31, vr20
822DB6B0 1BEC9AD4   vpermwi128    vr63, vr19, 0x6C
822DB6B4 106DD02F   vnmsubfp      vr3, vr13, vr0, vr26
822DB6B8 153CE2F1   vor128        vr9, vr60, vr60
822DB6BC 151EF2F1   vor128        vr8, vr62, vr62
822DB6C0 1B4C9A54   vpermwi128    vr58, vr19, 0x2C
822DB6C4 10A476EA   vsel          vr5, vr4, vr14, vr27           !! CULPRIT
822DB6C8 1775B9B0   vmsum3fp128   vr27, vr53, vr23
822DB6CC 1B5C9A10   vpermwi128    vr26, vr19, 0x1C
822DB6D0 16887270   vandc128      vr20, vr40, vr14
822DB6D4 1B370774   vspltisw128   vr57, 0x17
822DB6D8 17F1B8B1   vmulfp128     vr31, vr49, vr55
822DB6DC 142DC0B1   vmulfp128     vr1, vr45, vr56
822DB6E0 173FB1B0   vmsum3fp128   vr25, vr63, vr22
822DB6E4 1086E1EF   vnmsubfp      vr4, vr6, vr7, vr28
822DB6E8 14CE72F1   vor128        vr6, vr46, vr46
822DB6EC 10E8EA6F   vnmsubfp      vr7, vr8, vr9, vr29
822DB6F0 13A567AA   vsel          vr29, vr5, vr12, vr30
822DB6F4 16F5A9B0   vmsum3fp128   vr23, vr53, vr21
822DB6F8 1810C8D1   vslw128       vr0, vr16, vr57
822DB6FC 17DA11B0   vmsum3fp128   vr30, vr58, vr2
822DB700 1458B0B1   vmulfp128     vr2, vr56, vr54
822DB704 1594C1B0   vmsum3fp128   vr12, vr52, vr24
822DB708 14BCE2F1   vor128        vr5, vr60, vr60
822DB70C 151419B0   vmsum3fp128   vr8, vr52, vr3
822DB710 146EB0B1   vmulfp128     vr3, vr46, vr54
822DB714 1000A2C6   vcmpgtfp      vr0, vr0, vr20
822DB718 179321B0   vmsum3fp128   vr28, vr51, vr4
822DB71C 148F7AF1   vor128        vr4, vr47, vr47
822DB720 1139D88C   vmrghw        vr9, vr25, vr27
822DB724 177A3990   vmsum3fp128   vr27, vr26, vr7
822DB728 14FDEAF1   vor128        vr7, vr61, vr61
822DB72C 13E4F96F   vnmsubfp      vr31, vr4, vr5, vr31
822DB730 1B4BBB20   vmrghw128     vr26, vr43, vr23
822DB734 119E608C   vmrghw        vr12, vr30, vr12
822DB738 10A609EF   vnmsubfp      vr5, vr6, vr7, vr1
822DB73C 190C4320   vmrghw128     vr8, vr44, vr8
822DB740 113A488C   vmrghw        vr9, vr26, vr9
822DB744 1188608C   vmrghw        vr12, vr8, vr12
822DB748 153D4890   vmulfp128     vr9, vr29, vr9
822DB74C 14D3F9B0   vmsum3fp128   vr6, vr51, vr31
822DB750 159D6090   vmulfp128     vr12, vr29, vr12
822DB754 109BE08C   vmrghw        vr4, vr27, vr28
822DB758 1109982A   vsel          vr8, vr9, vr19, vr0
822DB75C 153BDAF1   vor128        vr9, vr59, vr59
822DB760 114C782A   vsel          vr10, vr12, vr15, vr0
822DB764 178842D8   vor128        vr92, vr8, vr8
822DB768 151DEAF1   vor128        vr8, vr61, vr61
822DB76C 159BDAF1   vor128        vr12, vr59, vr59
822DB770 17AA52D8   vor128        vr93, vr10, vr10
822DB774 154D6AF1   vor128        vr10, vr45, vr45
822DB778 10E8126F   vnmsubfp      vr7, vr8, vr9, vr2
822DB77C 112A1B2F   vnmsubfp      vr9, vr10, vr12, vr3
822DB780 194A3320   vmrghw128     vr10, vr42, vr6
822DB784 114A208C   vmrghw        vr10, vr10, vr4
822DB788 155D5090   vmulfp128     vr10, vr29, vr10
822DB78C 15133990   vmsum3fp128   vr8, vr19, vr7
822DB790 158F4990   vmsum3fp128   vr12, vr15, vr9
822DB794 152F2990   vmsum3fp128   vr9, vr15, vr5
822DB798 19896320   vmrghw128     vr12, vr41, vr12
822DB79C 1108488C   vmrghw        vr8, vr8, vr9
822DB7A0 112A902A   vsel          vr9, vr10, vr18, vr0
822DB7A4 118C408C   vmrghw        vr12, vr12, vr8
822DB7A8 17C94AD8   vor128        vr94, vr9, vr9
822DB7AC 159D6090   vmulfp128     vr12, vr29, vr12
822DB7B0 100C882A   vsel          vr0, vr12, vr17, vr0
822DB7B4 17E002D8   vor128        vr95, vr0, vr0
822DB7B8 4E800020   bclr          20, 0 

Found the culprit. Going to test, and will release in a bit.

Okay guys, can you confirm that this is fixed as of 3a8f8f2ecb2f28aaa82543ad8cf6ec3ff81bac94?

[3a8f8f2]
desktop screenshot 2018 03 03 - 02 24 55 20

Its fine now
bandicam 2018-03-03 04-24-46-587
bandicam 2018-03-03 04-26-30-311
bandicam 2018-03-03 04-27-30-486
bandicam 2018-03-03 04-29-12-282

Awesome! Appreciate the help guys.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yugi1408 picture yugi1408  路  3Comments

mkMoSs picture mkMoSs  路  3Comments

adokova picture adokova  路  4Comments

thx4ever picture thx4ever  路  3Comments

niko1point0 picture niko1point0  路  3Comments