This is achieved by commenting out "// govsim.SimulateMsgDeposit(app.govKeeper, app.stakeKeeper),", and then running:
go test ./cmd/gaia/app -run TestFullGaiaSimulation \
-SimulationSeed=1 -SimulationEnabled=true \
-SimulationNumBlocks=400 -SimulationBlockSize=400 \
-v -timeout 24h
(This was ran on #2285), the error occurs on block 203. The logs for this one are 21mb, so that will be a pain to parse through (They're now saved at least :D)
This one can also be replicated on Seed=4 in case that helps debugging
/cc @cwgoes @rigelrozanski @alexanderbez
Replicable with go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=3 -SimulationEnabled=true -SimulationNumBlocks=400 -SimulationBlockSize=400 -v -timeout 24h now on develop.
This one also fails fairly early on (may reduce amount of logs that have to be dug through):
go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=9573 -SimulationEnabled=true -SimulationNumBlocks=150 -SimulationBlockSize=200 -v -timeout 24h
(Fails on block 42)
Will take a look at this now while I wait for updates on #2305.
I believe I've discovered the problem that is causing this panic.
Jotting down initial brain dump here...
In a single block context and given the lowest three validators by power in the set: C (104), B (105), and A (106), a series of txs occur:
UpdateBondedValidators finds the correct validator to bond (A), but it incorrectly thinks B is the lowest one -- in other words A and B should be swapped in the iteration. A should be bonded AND A should the last validator in the iteration.This is causing the following to execute.
if newValidatorBonded && bytes.Equal(oldCliffValidatorAddr, validator.OperatorAddr) { // ...}
Again @cwgoes, setting the bondHeight in Keeper#UpdateValidator strikes again!
Once I resolve this and v0.25 is released, I'd like to focus on #2312.
fascinating - makes sense
Update: Looks like the issue I described above is not the culprit.
In a given block context, a series of txs occur that cause the bottom two validators in the set to have the same power (B & C). B gets bonded/enters set and A gets removed. However, A should not actually get removed because it's still in the top maxValidators somehow (after inspecting the power store). It getting removed actually causes the panic on the next tx. A change I made in #2332 addresses this, but it's very specific case and I don't feel comfortable with it.
Pastebin: https://pastebin.com/64hMim3b
I've been trying to debug this for quite some time now, so it's very possible I'm missing something super obvious. In any case, I can't see a clear fix here. Maybe we should punt this in favor of #2312?
@ValarDragon @cwgoes @rigelrozanski
100% agree, I vote to punt and do #2312
Closing in favor of #2312.
Most helpful comment
This one also fails fairly early on (may reduce amount of logs that have to be dug through):
(Fails on block 42)