Cosmos-sdk: ClevelDB memory leak

Created on 9 Mar 2019  Â·  11Comments  Â·  Source: cosmos/cosmos-sdk

Summary of Bug


Build and use with ClevelDB, each node consumes about 26GB memory after 20 hours uptime. It may be related to https://github.com/tendermint/tendermint/pull/3397.

We are testing with following changes:

diff --git a/store/rootmulti/store.go b/store/rootmulti/store.go
index 85d9b5b20..9c55fed2c 100644
--- a/store/rootmulti/store.go
+++ b/store/rootmulti/store.go
@@ -198,6 +198,7 @@ func (rs *Store) Commit() types.CommitID {

    // Need to update atomically.
    batch := rs.db.NewBatch()
+   defer batch.Close()
    setCommitInfo(batch, version, commitInfo)
    setLatestVersion(batch, version)
    batch.Write()

Will update once we have any progress.

Steps to Reproduce


run a fullnode or validator follow: https://github.com/lino-network/testnets


For Admin Use

  • [ ] Not duplicate issue
  • [ ] Appropriate labels applied
  • [ ] Appropriate contributors tagged
  • [ ] Contributor assigned/self-assigned
bug

Most helpful comment

@re7eal Yes, that's how we did it.
https://github.com/lino-network/lino/blob/9c36fcd7ae7ad99aaf9f9882f6f9c8144ba86f32/patches/fixes/tendermint-cleveldb-close-batch.patch#L24
We also applied a bloom filter, because theoretically it shall drastically improve the performance as we have more than 4000 files in our application db.

All 11 comments

cc @alessio

We should try to reproduce it @cosmos/cosmossdk

IMHO we should close the batch as it's recommended.

Update: we have run nodes using fixed version with this patch for 2 days. However, we are still seeing memory leak issue as those nodes takes 22 GB memory after 34 hours. Looking into it.

Thanks for the status update @Stumble!

Update: https://github.com/tendermint/iavl/pull/130. Will run with fixed version and see how it works, (I would guess this is the major cause of leaking), stay tuned.

CC @mossid

@Stumble the patch:

diff --git a/store/rootmulti/store.go b/store/rootmulti/store.go
index 85d9b5b20..9c55fed2c 100644
--- a/store/rootmulti/store.go
+++ b/store/rootmulti/store.go
@@ -198,6 +198,7 @@ func (rs *Store) Commit() types.CommitID {

    // Need to update atomically.
    batch := rs.db.NewBatch()
+   defer batch.Close()
    setCommitInfo(batch, version, commitInfo)
    setLatestVersion(batch, version)
    batch.Write()

seems relevant to me regardless of other changes. Am I right in my judgement that we should apply it anyway?

@alessio Yes, as long as the tendermint version is newer than https://github.com/tendermint/tendermint/commit/f996b10f479d7c9a6d81cca5a02c47b29a52b3f3 and https://github.com/tendermint/iavl/pull/130 is applied.

Update: with the above fixes, the node take < 2G memory (we lowered the size of cleveldb cache to 500MB) and is stable. The memory leak problem seems to be fixed.

@Stumble Could you please explain how you set cleveldb cache? Did you modify opts.SetCache(levigo.NewLRUCache(1 << 30)) in c_level_db.go in tendermint repo since it doesn't accept any DB options when calling NewDB()? Thanks.

@re7eal Yes, that's how we did it.
https://github.com/lino-network/lino/blob/9c36fcd7ae7ad99aaf9f9882f6f9c8144ba86f32/patches/fixes/tendermint-cleveldb-close-batch.patch#L24
We also applied a bloom filter, because theoretically it shall drastically improve the performance as we have more than 4000 files in our application db.

Was this page helpful?
0 / 5 - 0 ratings