Cosmos-sdk: Logical Commit(s) are not Atomic

Created on 8 Jun 2020  Â·  11Comments  Â·  Source: cosmos/cosmos-sdk

Summary of Bug

When syncing a node using --pruning=everything, it is possible to gracefully stop a node and it fail to restart.

Version

cosmos-sdk 0.38.4 (via both Kava v0.8.1 and Gaia)
tendermint 0.33.3 and 0.33.4

Steps to Reproduce

  1. A running validator exists. Pruning mode does not appear to be relevant here. Tried with nothing, everything and syncable.
  2. Start a fullnode, with the validator as it's persistent peer; set pruning=everything (both commandline and app.toml behave the same - as expected)
  3. Allow the node to sync briefly (best results > 2.5k block)
  4. ctrl+c to gracefully halt the syncing fullnode
  5. start the node again.
  6. node fails to start.

Expected behaviour: Node starts and begin syncing again

Actual behaviour: (not 100% of the time, but c 75% of tests)

I[2020-06-08|21:22:56.843] starting ABCI with Tendermint                module=main 
failed to load Store: wanted to load target 3171 but only found up to 0

Anecdotally, if you stop shortly after restarting, this does not happen - however, a few thousand blocks in, it is reasonably consistent behaviour that the node fails in the way described above.


For Admin Use

  • [ ] Not duplicate issue
  • [ ] Appropriate labels applied
  • [ ] Appropriate contributors tagged
  • [ ] Contributor assigned/self-assigned
bug ice-box kvstore

Most helpful comment

I've identified the problem: Cosmos SDK commits are not atomic (as @alexanderbez also suspected).

The SDK uses multiple logically-independent IAVL stores with a single backing LevelDB database via rootmulti.Store. When the root store is committed, it commits each of the backing IAVL stores sequentially, which both commits the new version and deletes the old. Once all stores have committed, the root store records the latest version. Of course, if the commit is aborted midway, some IAVL stores may have committed the new version (and deleted the previous) while others haven't - and the root still points to the previous version. On restart, it tries to load the previous version, but some IAVL stores have already deleted it.

The problem is further exacerbated by Tendermint not waiting for in-flight blocks to be applied before exiting.

The following are LevelDB entries from the broken node, showing the IAVL root nodes of the acc logical store, where /r signifies root followed by the big-endian int64 version:

application.db $ leveldbutil dump *.ldb | grep /r | sort
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298335 : val => '\x0c\xe8\xa8...
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298646 : del => ''
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1e' @ 298619 : val => '\xee\x8cj...

0x031e is 798 (exists), 0x031d is 797 (deleted). The following is the root store version key:

application.db $ leveldbutil dump *.ldb | grep s/latest | sort | tail -n 3
's/latest' @ 297796 : val => '\x9b\x06'
's/latest' @ 298184 : val => '\x9c\x06'
's/latest' @ 298572 : val => '\x9d\x06'

0x9d06 is the Amino-encoded (i.e. varint-encoded) integer version, which decodes to 797:

i, _ := binary.Uvarint([]byte{0x9d, 0x06})
fmt.Printf("%v", i) // Output: 797

Thus, on startup the root store attempts to load version 797, which has already been deleted by the acc IAVL store.

I believe this is a different problem from https://github.com/cosmos/iavl/issues/261, so I will continue debugging that separately.

How to fix

I will leave it for the SDK team to fix the commit issue. Ideally, a single ACID database transaction should be used to store all data for a given version, but this probably requires a major architecture change. Barring that, I'd recommend first committing all sub-stores, then updating the root commit data, and only then deleting the old versions - and also to consider the effects of having some IAVL stores with "future" versions already committed, which must be ignored/deleted/replaced.

Tendermint should also wait for commits to complete before shutting down, see https://github.com/tendermint/tendermint/issues/5002.

All 11 comments

Please confirm that you're starting the node with a pruning strategy, say nothing, and then later changing it, say to everything. If so, this behavior is not supported by IAVL.

You must keep in mind IAVL now keeps versions in-memory only and then periodically flushes to disk.

Note, I cannot reproduce this on Gaia.

Have confirmed this behavior as well on kava v0.8.1. Had to shutdown and restart the process ~20 times to get a failure.

Please confirm that you're starting the node with a pruning strategy, say nothing, and then later changing it, say to everything. If so, this behavior is not supported by IAVL.

You must keep in mind IAVL now keeps versions in-memory only and then periodically flushes to disk.

I can confirm this is not the case. This is using pruning=everything from height 0.

Able to replicate for the scenario of pruning=everything for heights above 2500.

Test Heights: 2500 ++ to 3300 ++
Failure rate: 1 in 5
Status: Unrecoverable after failure.
Sample Error Message:
I[2020-06-09|08:09:43.216] starting ABCI with Tendermint module=main
failed to load Store: wanted to load target 2842 but only found up to 0

Remedy: unsafe-reset-all and sync from scratch

@AdityaSripal @zmanian more pruning related issues. Seems like when pruning=everything from the onset causes an issue after you stop a node after a snapshot height.

I've been able to reproduce this with Gaia master, running 4 txs per 1-second block and restarting a node every 10 seconds:

gaiadnode3    | failed to load latest version: failed to load store: wanted to load target 797 but only found up to 0
gaiadnode3 exited with code 0

This happened when restarting just as block 798 was in the process of being committed. I have some ideas about what might be causing this, will dig further.

I've identified the problem: Cosmos SDK commits are not atomic (as @alexanderbez also suspected).

The SDK uses multiple logically-independent IAVL stores with a single backing LevelDB database via rootmulti.Store. When the root store is committed, it commits each of the backing IAVL stores sequentially, which both commits the new version and deletes the old. Once all stores have committed, the root store records the latest version. Of course, if the commit is aborted midway, some IAVL stores may have committed the new version (and deleted the previous) while others haven't - and the root still points to the previous version. On restart, it tries to load the previous version, but some IAVL stores have already deleted it.

The problem is further exacerbated by Tendermint not waiting for in-flight blocks to be applied before exiting.

The following are LevelDB entries from the broken node, showing the IAVL root nodes of the acc logical store, where /r signifies root followed by the big-endian int64 version:

application.db $ leveldbutil dump *.ldb | grep /r | sort
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298335 : val => '\x0c\xe8\xa8...
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1d' @ 298646 : del => ''
's/k:acc/r\x00\x00\x00\x00\x00\x00\x03\x1e' @ 298619 : val => '\xee\x8cj...

0x031e is 798 (exists), 0x031d is 797 (deleted). The following is the root store version key:

application.db $ leveldbutil dump *.ldb | grep s/latest | sort | tail -n 3
's/latest' @ 297796 : val => '\x9b\x06'
's/latest' @ 298184 : val => '\x9c\x06'
's/latest' @ 298572 : val => '\x9d\x06'

0x9d06 is the Amino-encoded (i.e. varint-encoded) integer version, which decodes to 797:

i, _ := binary.Uvarint([]byte{0x9d, 0x06})
fmt.Printf("%v", i) // Output: 797

Thus, on startup the root store attempts to load version 797, which has already been deleted by the acc IAVL store.

I believe this is a different problem from https://github.com/cosmos/iavl/issues/261, so I will continue debugging that separately.

How to fix

I will leave it for the SDK team to fix the commit issue. Ideally, a single ACID database transaction should be used to store all data for a given version, but this probably requires a major architecture change. Barring that, I'd recommend first committing all sub-stores, then updating the root commit data, and only then deleting the old versions - and also to consider the effects of having some IAVL stores with "future" versions already committed, which must be ignored/deleted/replaced.

Tendermint should also wait for commits to complete before shutting down, see https://github.com/tendermint/tendermint/issues/5002.

Excellent @erikgrinaker, this confirms my suspicions -- updates to the logical stores and the root logical store (along with pruning) is NOT atomic.

Being that they all use the same underlying physical store, we need to devise a way for them all to use the same batch object. So as you've pointed, some relatively major changes to the SDK and IAVL.

This is certainly blocking 0.39. I'll take this on.

We'll be tackling this when evaluating the future of IAVL / commitment proofs, and its currently blocked on that larger conversation. Handover of IAVL is starting from Interchain GmbH this week.

Was this page helpful?
0 / 5 - 0 ratings