Cosmos-sdk: Simulator: (intermittent) race detected during execution of test

Created on 13 Jun 2019  Â·  6Comments  Â·  Source: cosmos/cosmos-sdk

Summary of Bug

Some simulation runs from specific random seeds produce the following error:
testing.go:809: race detected during execution of test

Seeds that can produce this error: 699694472, 165338863, 416539974

Version

$ git log -1
commit 8c89023e9f7ce67492142c92acc9ba0d9f876c0e (HEAD, origin/master, origin/HEAD)
Author: frog power 4000 <[email protected]>
Date:   Wed Jun 12 17:57:47 2019 +0200

    Merge PR #4524: Staking ValidatorPowerRank to use "Potential Consensus Power"

Steps to Reproduce

Using exp-699694472-race.sh (see below), run 10 trials with:

for i in `seq 1 10`; do bash ./exp-699694472-race.sh | tee 699694472.race.$i.log &  done

Some runs will fail:

$ grep -e FAIL -e 'race detected' -e PASS 699694472.race.*.log
699694472.race.10.log:--- FAIL: TestFullAppSimulation (478.62s)
699694472.race.10.log:    testing.go:809: race detected during execution of test
699694472.race.10.log:FAIL
699694472.race.10.log:FAIL      github.com/cosmos/cosmos-sdk/simapp     479.898s
699694472.race.1.log:--- PASS: TestFullAppSimulation (8230.30s)
699694472.race.1.log:PASS
699694472.race.2.log:--- PASS: TestFullAppSimulation (8237.57s)
699694472.race.2.log:PASS
699694472.race.3.log:--- PASS: TestFullAppSimulation (8248.72s)
699694472.race.3.log:PASS
699694472.race.4.log:--- PASS: TestFullAppSimulation (8268.98s)
699694472.race.4.log:PASS
699694472.race.5.log:--- FAIL: TestFullAppSimulation (473.46s)
699694472.race.5.log:    testing.go:809: race detected during execution of test
699694472.race.5.log:FAIL
699694472.race.5.log:FAIL       github.com/cosmos/cosmos-sdk/simapp     474.693s
699694472.race.6.log:--- FAIL: TestFullAppSimulation (478.91s)
699694472.race.6.log:    testing.go:809: race detected during execution of test
699694472.race.6.log:FAIL
699694472.race.6.log:FAIL       github.com/cosmos/cosmos-sdk/simapp     479.962s
699694472.race.7.log:--- FAIL: TestFullAppSimulation (340.88s)
699694472.race.7.log:    testing.go:809: race detected during execution of test
699694472.race.7.log:FAIL
699694472.race.7.log:FAIL       github.com/cosmos/cosmos-sdk/simapp     342.039s
699694472.race.8.log:--- PASS: TestFullAppSimulation (334.21s)
699694472.race.8.log:PASS
699694472.race.9.log:--- FAIL: TestFullAppSimulation (477.33s)
699694472.race.9.log:    testing.go:809: race detected during execution of test
699694472.race.9.log:FAIL
699694472.race.9.log:FAIL      github.com/cosmos/cosmos-sdk/simapp     478.660s

Full logs:
699694472.race.1.log
699694472.race.2.log
699694472.race.3.log
699694472.race.4.log
699694472.race.5.log
699694472.race.6.log
699694472.race.7.log
699694472.race.8.log
699694472.race.9.log

--

$ cat ./exp-699694472-race.sh
#!/bin/bash

set -e
set -x

seed=699694472

blocks=400
period=5
timeout=6h
blockSize=200
verbose=false

git log -1

git diff


go test -race -mod=readonly github.com/cosmos/cosmos-sdk/simapp -run TestFullAppSimulation -SimulationEnabled=true -SimulationNumBlocks=${blocks} -SimulationBlockSize=${blockSize} -SimulationGenesis= -SimulationVerbose=${verbose} -SimulationCommit=true -SimulationSeed=${seed} -SimulationPeriod=${period} -v -timeout ${timeout}

Logs from other seeds:
165338863.1.log
416539974.1.log


For Admin Use

  • [ ] Not duplicate issue
  • [ ] Appropriate labels applied
  • [ ] Appropriate contributors tagged
  • [ ] Contributor assigned/self-assigned
bug good first issue simulation stale

Most helpful comment

Looked into this race. It occurs because of the concurrent access to leveldb's buffer pool state . That state is updated in BufferPool.Put/Get methods using atomic.Store/Add, part of the race condition trace:

Previous write at 0x00c000c180c0 by goroutine 27:
  sync/atomic.AddInt32()
      /snap/go/3739/src/runtime/race_amd64.s:269 +0xb
  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get()

Tendermint collects stats periodically and reads buffer pool state with a call to DB.GetProperty("leveldb.blockpool"). Which in turn calls BufferPool.String. First part of the race condition trace:

WARNING: DATA RACE
Read at 0x00c000c180c0 by goroutine 23:
  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).String()
      /home/cosmos/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:199 +0xa0
  fmt.(*pp).handleMethods()
      /snap/go/3739/src/fmt/print.go:616 +0x433
  fmt.(*pp).printArg()
      /snap/go/3739/src/fmt/print.go:699 +0x24f
  fmt.(*pp).doPrintf()
      /snap/go/3739/src/fmt/print.go:1016 +0x31c
  fmt.Sprintf()
      /snap/go/3739/src/fmt/print.go:214 +0x73
  github.com/syndtr/goleveldb/leveldb.(*DB).GetProperty()

I proposed an obvious fix here https://github.com/syndtr/goleveldb/pull/312 . However I am not sure how valuable the information about leveldb block buffer and maybe it is enough to collect levedb memory usage.

All 6 comments

/cc a good one @tnachen

Looked into this race. It occurs because of the concurrent access to leveldb's buffer pool state . That state is updated in BufferPool.Put/Get methods using atomic.Store/Add, part of the race condition trace:

Previous write at 0x00c000c180c0 by goroutine 27:
  sync/atomic.AddInt32()
      /snap/go/3739/src/runtime/race_amd64.s:269 +0xb
  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get()

Tendermint collects stats periodically and reads buffer pool state with a call to DB.GetProperty("leveldb.blockpool"). Which in turn calls BufferPool.String. First part of the race condition trace:

WARNING: DATA RACE
Read at 0x00c000c180c0 by goroutine 23:
  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).String()
      /home/cosmos/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:199 +0xa0
  fmt.(*pp).handleMethods()
      /snap/go/3739/src/fmt/print.go:616 +0x433
  fmt.(*pp).printArg()
      /snap/go/3739/src/fmt/print.go:699 +0x24f
  fmt.(*pp).doPrintf()
      /snap/go/3739/src/fmt/print.go:1016 +0x31c
  fmt.Sprintf()
      /snap/go/3739/src/fmt/print.go:214 +0x73
  github.com/syndtr/goleveldb/leveldb.(*DB).GetProperty()

I proposed an obvious fix here https://github.com/syndtr/goleveldb/pull/312 . However I am not sure how valuable the information about leveldb block buffer and maybe it is enough to collect levedb memory usage.

Wow, thanks for looking into this @dshulyak

Shouldn’t tempdir already create a unique folder? That name just a prefix
right?

On Thu, Dec 12, 2019 at 12:32 Arda Güçlü notifications@github.com wrote:

According to the attached stack trace, data race is happening at this code
portion;

dir, _ := ioutil.TempDir("", "goleveldb-app-sim") db, _ =
sdk.NewLevelDB("Simulation", dir)

I guess, if we give some random/uuid for the database directory name
instead of static "goleveldb-app-sim", problem will be solved(and this can
be applied other test cases too). Unless the problem lies in the db
creation part of the dependent library about lack of sync. @alexanderbez
https://github.com/alexanderbez

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/4545?email_source=notifications&email_token=AAAQNLFDZBCE6MYHOWP5IT3QYKNWDA5CNFSM4HXZL4V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGX6CMY#issuecomment-565174579,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAQNLA6KZYIRDC4WM3XDQ3QYKNWDANCNFSM4HXZL4VQ
.

I tried to investigate the issue a little bit and according to the @dshulyak's proposal, problem is about lack of concurrency in leveldb and my solution might not be enough for that case. @tnachen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

faboweb picture faboweb  Â·  3Comments

adrianbrink picture adrianbrink  Â·  3Comments

ValarDragon picture ValarDragon  Â·  3Comments

rigelrozanski picture rigelrozanski  Â·  3Comments

ValarDragon picture ValarDragon  Â·  3Comments