Some simulation runs from specific random seeds produce the following error:
testing.go:809: race detected during execution of test
Seeds that can produce this error: 699694472, 165338863, 416539974
$ git log -1
commit 8c89023e9f7ce67492142c92acc9ba0d9f876c0e (HEAD, origin/master, origin/HEAD)
Author: frog power 4000 <[email protected]>
Date: Wed Jun 12 17:57:47 2019 +0200
Merge PR #4524: Staking ValidatorPowerRank to use "Potential Consensus Power"
Using exp-699694472-race.sh (see below), run 10 trials with:
for i in `seq 1 10`; do bash ./exp-699694472-race.sh | tee 699694472.race.$i.log & done
Some runs will fail:
$ grep -e FAIL -e 'race detected' -e PASS 699694472.race.*.log
699694472.race.10.log:--- FAIL: TestFullAppSimulation (478.62s)
699694472.race.10.log: testing.go:809: race detected during execution of test
699694472.race.10.log:FAIL
699694472.race.10.log:FAIL github.com/cosmos/cosmos-sdk/simapp 479.898s
699694472.race.1.log:--- PASS: TestFullAppSimulation (8230.30s)
699694472.race.1.log:PASS
699694472.race.2.log:--- PASS: TestFullAppSimulation (8237.57s)
699694472.race.2.log:PASS
699694472.race.3.log:--- PASS: TestFullAppSimulation (8248.72s)
699694472.race.3.log:PASS
699694472.race.4.log:--- PASS: TestFullAppSimulation (8268.98s)
699694472.race.4.log:PASS
699694472.race.5.log:--- FAIL: TestFullAppSimulation (473.46s)
699694472.race.5.log: testing.go:809: race detected during execution of test
699694472.race.5.log:FAIL
699694472.race.5.log:FAIL github.com/cosmos/cosmos-sdk/simapp 474.693s
699694472.race.6.log:--- FAIL: TestFullAppSimulation (478.91s)
699694472.race.6.log: testing.go:809: race detected during execution of test
699694472.race.6.log:FAIL
699694472.race.6.log:FAIL github.com/cosmos/cosmos-sdk/simapp 479.962s
699694472.race.7.log:--- FAIL: TestFullAppSimulation (340.88s)
699694472.race.7.log: testing.go:809: race detected during execution of test
699694472.race.7.log:FAIL
699694472.race.7.log:FAIL github.com/cosmos/cosmos-sdk/simapp 342.039s
699694472.race.8.log:--- PASS: TestFullAppSimulation (334.21s)
699694472.race.8.log:PASS
699694472.race.9.log:--- FAIL: TestFullAppSimulation (477.33s)
699694472.race.9.log: testing.go:809: race detected during execution of test
699694472.race.9.log:FAIL
699694472.race.9.log:FAIL github.com/cosmos/cosmos-sdk/simapp 478.660s
Full logs:
699694472.race.1.log
699694472.race.2.log
699694472.race.3.log
699694472.race.4.log
699694472.race.5.log
699694472.race.6.log
699694472.race.7.log
699694472.race.8.log
699694472.race.9.log
--
$ cat ./exp-699694472-race.sh
#!/bin/bash
set -e
set -x
seed=699694472
blocks=400
period=5
timeout=6h
blockSize=200
verbose=false
git log -1
git diff
go test -race -mod=readonly github.com/cosmos/cosmos-sdk/simapp -run TestFullAppSimulation -SimulationEnabled=true -SimulationNumBlocks=${blocks} -SimulationBlockSize=${blockSize} -SimulationGenesis= -SimulationVerbose=${verbose} -SimulationCommit=true -SimulationSeed=${seed} -SimulationPeriod=${period} -v -timeout ${timeout}
Logs from other seeds:
165338863.1.log
416539974.1.log
/cc a good one @tnachen
Looked into this race. It occurs because of the concurrent access to leveldb's buffer pool state . That state is updated in BufferPool.Put/Get methods using atomic.Store/Add, part of the race condition trace:
Previous write at 0x00c000c180c0 by goroutine 27:
sync/atomic.AddInt32()
/snap/go/3739/src/runtime/race_amd64.s:269 +0xb
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get()
Tendermint collects stats periodically and reads buffer pool state with a call to DB.GetProperty("leveldb.blockpool"). Which in turn calls BufferPool.String. First part of the race condition trace:
WARNING: DATA RACE
Read at 0x00c000c180c0 by goroutine 23:
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).String()
/home/cosmos/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:199 +0xa0
fmt.(*pp).handleMethods()
/snap/go/3739/src/fmt/print.go:616 +0x433
fmt.(*pp).printArg()
/snap/go/3739/src/fmt/print.go:699 +0x24f
fmt.(*pp).doPrintf()
/snap/go/3739/src/fmt/print.go:1016 +0x31c
fmt.Sprintf()
/snap/go/3739/src/fmt/print.go:214 +0x73
github.com/syndtr/goleveldb/leveldb.(*DB).GetProperty()
I proposed an obvious fix here https://github.com/syndtr/goleveldb/pull/312 . However I am not sure how valuable the information about leveldb block buffer and maybe it is enough to collect levedb memory usage.
Wow, thanks for looking into this @dshulyak
Shouldn’t tempdir already create a unique folder? That name just a prefix
right?
On Thu, Dec 12, 2019 at 12:32 Arda Güçlü notifications@github.com wrote:
According to the attached stack trace, data race is happening at this code
portion;dir, _ := ioutil.TempDir("", "goleveldb-app-sim") db, _ =
sdk.NewLevelDB("Simulation", dir)I guess, if we give some random/uuid for the database directory name
instead of static "goleveldb-app-sim", problem will be solved(and this can
be applied other test cases too). Unless the problem lies in the db
creation part of the dependent library about lack of sync. @alexanderbez
https://github.com/alexanderbez—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/4545?email_source=notifications&email_token=AAAQNLFDZBCE6MYHOWP5IT3QYKNWDA5CNFSM4HXZL4V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGX6CMY#issuecomment-565174579,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAQNLA6KZYIRDC4WM3XDQ3QYKNWDANCNFSM4HXZL4VQ
.
I tried to investigate the issue a little bit and according to the @dshulyak's proposal, problem is about lack of concurrency in leveldb and my solution might not be enough for that case. @tnachen
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Looked into this race. It occurs because of the concurrent access to leveldb's buffer pool state . That state is updated in BufferPool.Put/Get methods using atomic.Store/Add, part of the race condition trace:
Tendermint collects stats periodically and reads buffer pool state with a call to DB.GetProperty("leveldb.blockpool"). Which in turn calls BufferPool.String. First part of the race condition trace:
I proposed an obvious fix here https://github.com/syndtr/goleveldb/pull/312 . However I am not sure how valuable the information about leveldb block buffer and maybe it is enough to collect levedb memory usage.