Cosmos-sdk: Randomized Fuzzer: cliff validator has not been changed, yet we bonded a new validator

Created on 10 Sep 2018  Â·  8Comments  Â·  Source: cosmos/cosmos-sdk

Summary

This is achieved by commenting out "// govsim.SimulateMsgDeposit(app.govKeeper, app.stakeKeeper),", and then running:

go test ./cmd/gaia/app -run TestFullGaiaSimulation \
  -SimulationSeed=1 -SimulationEnabled=true \
  -SimulationNumBlocks=400 -SimulationBlockSize=400 \
  -v -timeout 24h

(This was ran on #2285), the error occurs on block 203. The logs for this one are 21mb, so that will be a pain to parse through (They're now saved at least :D)

This one can also be replicated on Seed=4 in case that helps debugging

/cc @cwgoes @rigelrozanski @alexanderbez

bug staking

Most helpful comment

This one also fails fairly early on (may reduce amount of logs that have to be dug through):

go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=9573 -SimulationEnabled=true -SimulationNumBlocks=150 -SimulationBlockSize=200 -v -timeout 24h

(Fails on block 42)

All 8 comments

Replicable with go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=3 -SimulationEnabled=true -SimulationNumBlocks=400 -SimulationBlockSize=400 -v -timeout 24h now on develop.

This one also fails fairly early on (may reduce amount of logs that have to be dug through):

go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=9573 -SimulationEnabled=true -SimulationNumBlocks=150 -SimulationBlockSize=200 -v -timeout 24h

(Fails on block 42)

Will take a look at this now while I wait for updates on #2305.

I believe I've discovered the problem that is causing this panic.

Jotting down initial brain dump here...

In a single block context and given the lowest three validators by power in the set: C (104), B (105), and A (106), a series of txs occur:

  • C (104) gets kicked out by another validator gaining power (111)
  • B (105) gets kicked out by another validator gaining power (135)
  • A (106) gets kicked out by another validator, in this case B gaining power resulting in 106

    • So now B is the new cliff with 106 and A is kicked out, also with 106.

  • Finally, some other non-bonded validator gets updated (irrelevant as it does not enter the set). UpdateBondedValidators finds the correct validator to bond (A), but it incorrectly thinks B is the lowest one -- in other words A and B should be swapped in the iteration. A should be bonded AND A should the last validator in the iteration.

This is causing the following to execute.

if newValidatorBonded && bytes.Equal(oldCliffValidatorAddr, validator.OperatorAddr) { // ...}

Again @cwgoes, setting the bondHeight in Keeper#UpdateValidator strikes again!

Once I resolve this and v0.25 is released, I'd like to focus on #2312.

fascinating - makes sense

Update: Looks like the issue I described above is not the culprit.

In a given block context, a series of txs occur that cause the bottom two validators in the set to have the same power (B & C). B gets bonded/enters set and A gets removed. However, A should not actually get removed because it's still in the top maxValidators somehow (after inspecting the power store). It getting removed actually causes the panic on the next tx. A change I made in #2332 addresses this, but it's very specific case and I don't feel comfortable with it.

Pastebin: https://pastebin.com/64hMim3b

I've been trying to debug this for quite some time now, so it's very possible I'm missing something super obvious. In any case, I can't see a clear fix here. Maybe we should punt this in favor of #2312?

@ValarDragon @cwgoes @rigelrozanski

100% agree, I vote to punt and do #2312

Closing in favor of #2312.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kevlubkcm picture kevlubkcm  Â·  3Comments

rigelrozanski picture rigelrozanski  Â·  3Comments

mossid picture mossid  Â·  3Comments

ValarDragon picture ValarDragon  Â·  3Comments

cwgoes picture cwgoes  Â·  3Comments