Tendermint: replay doesn't work for fast sync

Created on 8 Feb 2016  路  3Comments  路  Source: tendermint/tendermint

sync bug

All 3 comments

Hi, since there is no description I can't be sure, but I guess I came across this issue myself while doing a little test as follows:

I started 4 Nodes. 3 used counter --serial as an app, while one used counter without the flag.
I wanted to see the consensus at work, and I did: TX 0x00 went fine. Following with TX 0x03 crashed the fourth node, since it's decision was overruled.

Now restarting the fourth node with counter --serial as well I had assumed, that it would get back on track with the other nodes, since it would now produce consistent results.
But since the counter example does not persist it's state this didn't work.

So I reset the tendermint node (deleted $TMROOT/data and reset height, round and state in priv_validator) and restarted both the counter and tendermint.
But then it could not catch up (fast sync was enabled by default).
It crashed with the error message:

panic: Paniced questionably: Failed to process committed block: Wrong Block.Header.AppHash.  Expected , got 0000000000000001

_First question_: Is this what this ticket is about? If not, then either I stumbled upon something else, or simply have misunderstood something.

Anyway: I looked into the source where this error message came from and eventually arrived at this line.
The comment says

// Execute transactions and get hash

but the code does only seem to AppendTx. It never Commits, so the AppHash is never updated and then validation of subsequent blocks obviously fails.
I tried to just Commit and update the AppHash at the end of execBlockOnProxyApp and it successfully fast-synced the state and went to consensus mode afterwards.

Could this be the problem with fast sync?

Thanks for doing this experiment, and for the detailed report!

Indeed, Commit was removed experimentally many months ago, and somewhat by accident was not added back, so fast sync no longer works properly. There were some other known reports, and we have just not had a chance to dive into it. So thanks for clearly identifying what may be the culprit behind all these issues!

We will work on a fix and some tests in next couple days, including running your experiment as a test. Thanks!

Should be addressed by https://github.com/tendermint/tendermint/pull/267 and https://github.com/tendermint/tendermint/pull/296 but probably needs more vigilant testing

Was this page helpful?
0 / 5 - 0 ratings