Go: Benchmark results vastly different when Tests are skipped with -run=XXX

Created on 18 Nov 2020  ·  8Comments  ·  Source: golang/go

What version of Go are you using (go version)?

$ go version
go version go1.15.5 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/acotten/.cache/go-build"
GOENV="/home/acotten/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/acotten/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/acotten/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/19fda1e474bcc545cffe48ac07f9889e/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build782682920=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran a benchmark that is part of the following program's tests: https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e, once without -run flag, once with -run=XXX to skip tests.

What did you expect to see?

Similar results in both cases.

What did you see instead?

Vastly different results:

$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         391664              3062 ns/op
BenchmarkGenerate         380072              3045 ns/op
BenchmarkGenerate         388015              3081 ns/op
BenchmarkGenerate         392066              3041 ns/op
BenchmarkGenerate         381025              3093 ns/op
BenchmarkGenerate         397995              3041 ns/op
BenchmarkGenerate         385315              3089 ns/op
BenchmarkGenerate         398268              3041 ns/op
BenchmarkGenerate         389184              3092 ns/op
BenchmarkGenerate         395845              3056 ns/op
BenchmarkGenerate         382033              3090 ns/op
BenchmarkGenerate         394696              3028 ns/op
BenchmarkGenerate         391120              3089 ns/op
BenchmarkGenerate         379633              3033 ns/op
BenchmarkGenerate         337450              3103 ns/op
BenchmarkGenerate         392859              3038 ns/op
BenchmarkGenerate         383972              3086 ns/op
BenchmarkGenerate         392360              3041 ns/op
BenchmarkGenerate         382046              3098 ns/op
BenchmarkGenerate         381825              3042 ns/op
PASS
ok      cegen   25.296s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         166654              6381 ns/op
BenchmarkGenerate         180555              6240 ns/op
BenchmarkGenerate         181734              6258 ns/op
BenchmarkGenerate         183918              6260 ns/op
BenchmarkGenerate         181689              6275 ns/op
BenchmarkGenerate         181914              6310 ns/op
BenchmarkGenerate         181581              6284 ns/op
BenchmarkGenerate         182289              6284 ns/op
BenchmarkGenerate         181348              6267 ns/op
BenchmarkGenerate         180328              6275 ns/op
BenchmarkGenerate         179772              6259 ns/op
BenchmarkGenerate         183290              6221 ns/op
BenchmarkGenerate         179972              6282 ns/op
BenchmarkGenerate         182958              6257 ns/op
BenchmarkGenerate         182750              6302 ns/op
BenchmarkGenerate         181096              6261 ns/op
BenchmarkGenerate         180844              6294 ns/op
BenchmarkGenerate         181142              6287 ns/op
BenchmarkGenerate         180400              6275 ns/op
BenchmarkGenerate         180426              6266 ns/op
PASS
ok      cegen   24.072s
NeedsInvestigation

Most helpful comment

Maybe we can remove test one by one to see if any test in particular turns out to improve the benchmark results?

Maybe then we can see if that test has side effects like registering a callback or changing some global state without resetting it back to defaults. For example I can see there is a sync.Pool with global state involved. It might just be that the tests causes the sync.Pool to be filled and causing less allocations in the later benchmark. You can add b.ReportAllocs() before the for loop in the benchmark to see allocations.

All 8 comments

Can you run both with -count=20 -cpu=1 and see if there is a high interrun variance? Also make sure to run on a quiet computer ideally with other applications and browsers closed.

If this benchmark does any disk or network io this might also introduce latencies and if run only one might have received a cache result why the other run didnt.

@martisch here's a run with those flags:

$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         166654              6381 ns/op
BenchmarkGenerate         180555              6240 ns/op
BenchmarkGenerate         181734              6258 ns/op
BenchmarkGenerate         183918              6260 ns/op
BenchmarkGenerate         181689              6275 ns/op
BenchmarkGenerate         181914              6310 ns/op
BenchmarkGenerate         181581              6284 ns/op
BenchmarkGenerate         182289              6284 ns/op
BenchmarkGenerate         181348              6267 ns/op
BenchmarkGenerate         180328              6275 ns/op
BenchmarkGenerate         179772              6259 ns/op
BenchmarkGenerate         183290              6221 ns/op
BenchmarkGenerate         179972              6282 ns/op
BenchmarkGenerate         182958              6257 ns/op
BenchmarkGenerate         182750              6302 ns/op
BenchmarkGenerate         181096              6261 ns/op
BenchmarkGenerate         180844              6294 ns/op
BenchmarkGenerate         181142              6287 ns/op
BenchmarkGenerate         180400              6275 ns/op
BenchmarkGenerate         180426              6266 ns/op
PASS
ok      cegen   24.072s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         391664              3062 ns/op
BenchmarkGenerate         380072              3045 ns/op
BenchmarkGenerate         388015              3081 ns/op
BenchmarkGenerate         392066              3041 ns/op
BenchmarkGenerate         381025              3093 ns/op
BenchmarkGenerate         397995              3041 ns/op
BenchmarkGenerate         385315              3089 ns/op
BenchmarkGenerate         398268              3041 ns/op
BenchmarkGenerate         389184              3092 ns/op
BenchmarkGenerate         395845              3056 ns/op
BenchmarkGenerate         382033              3090 ns/op
BenchmarkGenerate         394696              3028 ns/op
BenchmarkGenerate         391120              3089 ns/op
BenchmarkGenerate         379633              3033 ns/op
BenchmarkGenerate         337450              3103 ns/op
BenchmarkGenerate         392859              3038 ns/op
BenchmarkGenerate         383972              3086 ns/op
BenchmarkGenerate         392360              3041 ns/op
BenchmarkGenerate         382046              3098 ns/op
BenchmarkGenerate         381825              3042 ns/op
PASS
ok      cegen   25.296s

My laptop isn't busy right now, I closed other applications prior to running the benchmarks, and the application isn't interacting with the disk or network (code posted in the description).

I'm curious what _you_ get when you execute those commands?

(edit: updating issue description with those results)

Thanks. My only hunch now is the test might be setting or caching something that helps the benchmarks. That is quite a bit of code and dependencies which I had not had time to understand and I am currently reluctant to just run it. If somebody can reduce the complexity to have a more minimal reproducer that could help find why there is an effect.

~Sure, let me isolate the function in the Benchmark.~

Hm actually no, I shouldn't do that. If I _remove_ all Test* funcs inside main_test.go without changing anything else, the Benchmark results are now consistent.

$ go test ./ -bench=BenchmarkGenerate -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         390487              3062 ns/op
PASS
ok      cegen   1.232ss
$ go test ./ -bench=BenchmarkGenerate -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         393402              3028 ns/op
PASS
ok      cegen   2.215s

Maybe we can remove test one by one to see if any test in particular turns out to improve the benchmark results?

Maybe then we can see if that test has side effects like registering a callback or changing some global state without resetting it back to defaults. For example I can see there is a sync.Pool with global state involved. It might just be that the tests causes the sync.Pool to be filled and causing less allocations in the later benchmark. You can add b.ReportAllocs() before the for loop in the benchmark to see allocations.

Oh, right, global vars are _shared between all tests_. I didn't even consider that angle. Thanks for bringing it up 🤦

TestRun indeed causes the sync.Pool to be initialized with much small buffers (2 bytes vs. 2048), making the pool useless in the benchmark if not skipped:

https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e#file-main_test-go-L63-L65

$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         164232              6758 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         177374              6586 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         180433              6594 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         175017              6696 ns/op           13945 B/op         22 allocs/op
BenchmarkGenerate         174841              6692 ns/op           13945 B/op         22 allocs/op
PASS
ok      cegen   7.175s
$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate         375070              3244 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         387146              3217 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         381927              3178 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         389636              3191 ns/op             192 B/op          8 allocs/op
BenchmarkGenerate         336218              3163 ns/op             192 B/op          8 allocs/op
PASS
ok      cegen   6.157s

Thanks a lot for investigating. 🙌 Mystery solved!


(edit)

In case someone read this and the cause of the issue isn't obvious 👇

diff --git a/perf/tools/cegen/main.go b/perf/tools/cegen/main.go
index cf585da..1662866 100644
--- a/perf/tools/cegen/main.go
+++ b/perf/tools/cegen/main.go
@@ -139,6 +139,9 @@ type CloudEventTargetsGenerator struct {
        typeAttr   string
        sourceAttr string
        data       []byte
+
+       // Once used to initialize the buffer pool on the first call to Generate.
+       bufOnce sync.Once
 }

 // NewCloudEventTargetsGenerator returns a generator that yields vegeta JSON
@@ -157,9 +160,6 @@ func NewCloudEventTargetsGenerator(url, typeAttr, sourceAttr string, data []byte
 // Buffer pool for jwriter.Writer's underlying Buffer and output.
 var writerBufPool *sync.Pool

-// Once used to initialize the buffer pool on the first call to Generate.
-var bufOnce sync.Once
-
 // Generate returns a target serialized as JSON.
 func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
        var t jsonTarget
@@ -181,7 +181,7 @@ func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
        t.Body = g.data

        // encode inside the Once fn to determine the size of buffers in sync pools
-       bufOnce.Do(func() {
+       g.bufOnce.Do(func() {
                var jw jwriter.Writer
                t.encode(&jw)
                dataBytes, _ := jw.BuildBytes()

@antoineco, FWIW, putting differently-sized buffers in a sync.Pool is almost never appropriate, and our sync.Pool examples have historically not been very clear about that. (That's #23199.)

@bcmills that's another interesting point 👍 Although in my case the issue was purely a misuse, as a side effect of me expecting the global var to be "fresh" with every test.

Was this page helpful?
0 / 5 - 0 ratings