go version)?$ go version
go version go1.15.5 linux/amd64
Yes
go env)?go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/acotten/.cache/go-build"
GOENV="/home/acotten/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/acotten/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/acotten/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/19fda1e474bcc545cffe48ac07f9889e/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build782682920=/tmp/go-build -gno-record-gcc-switches"
Ran a benchmark that is part of the following program's tests: https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e, once without -run flag, once with -run=XXX to skip tests.
Similar results in both cases.
Vastly different results:
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 391664 3062 ns/op
BenchmarkGenerate 380072 3045 ns/op
BenchmarkGenerate 388015 3081 ns/op
BenchmarkGenerate 392066 3041 ns/op
BenchmarkGenerate 381025 3093 ns/op
BenchmarkGenerate 397995 3041 ns/op
BenchmarkGenerate 385315 3089 ns/op
BenchmarkGenerate 398268 3041 ns/op
BenchmarkGenerate 389184 3092 ns/op
BenchmarkGenerate 395845 3056 ns/op
BenchmarkGenerate 382033 3090 ns/op
BenchmarkGenerate 394696 3028 ns/op
BenchmarkGenerate 391120 3089 ns/op
BenchmarkGenerate 379633 3033 ns/op
BenchmarkGenerate 337450 3103 ns/op
BenchmarkGenerate 392859 3038 ns/op
BenchmarkGenerate 383972 3086 ns/op
BenchmarkGenerate 392360 3041 ns/op
BenchmarkGenerate 382046 3098 ns/op
BenchmarkGenerate 381825 3042 ns/op
PASS
ok cegen 25.296s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 166654 6381 ns/op
BenchmarkGenerate 180555 6240 ns/op
BenchmarkGenerate 181734 6258 ns/op
BenchmarkGenerate 183918 6260 ns/op
BenchmarkGenerate 181689 6275 ns/op
BenchmarkGenerate 181914 6310 ns/op
BenchmarkGenerate 181581 6284 ns/op
BenchmarkGenerate 182289 6284 ns/op
BenchmarkGenerate 181348 6267 ns/op
BenchmarkGenerate 180328 6275 ns/op
BenchmarkGenerate 179772 6259 ns/op
BenchmarkGenerate 183290 6221 ns/op
BenchmarkGenerate 179972 6282 ns/op
BenchmarkGenerate 182958 6257 ns/op
BenchmarkGenerate 182750 6302 ns/op
BenchmarkGenerate 181096 6261 ns/op
BenchmarkGenerate 180844 6294 ns/op
BenchmarkGenerate 181142 6287 ns/op
BenchmarkGenerate 180400 6275 ns/op
BenchmarkGenerate 180426 6266 ns/op
PASS
ok cegen 24.072s
Can you run both with -count=20 -cpu=1 and see if there is a high interrun variance? Also make sure to run on a quiet computer ideally with other applications and browsers closed.
If this benchmark does any disk or network io this might also introduce latencies and if run only one might have received a cache result why the other run didnt.
@martisch here's a run with those flags:
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 166654 6381 ns/op
BenchmarkGenerate 180555 6240 ns/op
BenchmarkGenerate 181734 6258 ns/op
BenchmarkGenerate 183918 6260 ns/op
BenchmarkGenerate 181689 6275 ns/op
BenchmarkGenerate 181914 6310 ns/op
BenchmarkGenerate 181581 6284 ns/op
BenchmarkGenerate 182289 6284 ns/op
BenchmarkGenerate 181348 6267 ns/op
BenchmarkGenerate 180328 6275 ns/op
BenchmarkGenerate 179772 6259 ns/op
BenchmarkGenerate 183290 6221 ns/op
BenchmarkGenerate 179972 6282 ns/op
BenchmarkGenerate 182958 6257 ns/op
BenchmarkGenerate 182750 6302 ns/op
BenchmarkGenerate 181096 6261 ns/op
BenchmarkGenerate 180844 6294 ns/op
BenchmarkGenerate 181142 6287 ns/op
BenchmarkGenerate 180400 6275 ns/op
BenchmarkGenerate 180426 6266 ns/op
PASS
ok cegen 24.072s
$ go test ./ -bench=BenchmarkGenerate -count=20 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 391664 3062 ns/op
BenchmarkGenerate 380072 3045 ns/op
BenchmarkGenerate 388015 3081 ns/op
BenchmarkGenerate 392066 3041 ns/op
BenchmarkGenerate 381025 3093 ns/op
BenchmarkGenerate 397995 3041 ns/op
BenchmarkGenerate 385315 3089 ns/op
BenchmarkGenerate 398268 3041 ns/op
BenchmarkGenerate 389184 3092 ns/op
BenchmarkGenerate 395845 3056 ns/op
BenchmarkGenerate 382033 3090 ns/op
BenchmarkGenerate 394696 3028 ns/op
BenchmarkGenerate 391120 3089 ns/op
BenchmarkGenerate 379633 3033 ns/op
BenchmarkGenerate 337450 3103 ns/op
BenchmarkGenerate 392859 3038 ns/op
BenchmarkGenerate 383972 3086 ns/op
BenchmarkGenerate 392360 3041 ns/op
BenchmarkGenerate 382046 3098 ns/op
BenchmarkGenerate 381825 3042 ns/op
PASS
ok cegen 25.296s
My laptop isn't busy right now, I closed other applications prior to running the benchmarks, and the application isn't interacting with the disk or network (code posted in the description).
I'm curious what _you_ get when you execute those commands?
(edit: updating issue description with those results)
Thanks. My only hunch now is the test might be setting or caching something that helps the benchmarks. That is quite a bit of code and dependencies which I had not had time to understand and I am currently reluctant to just run it. If somebody can reduce the complexity to have a more minimal reproducer that could help find why there is an effect.
~Sure, let me isolate the function in the Benchmark.~
Hm actually no, I shouldn't do that. If I _remove_ all Test* funcs inside main_test.go without changing anything else, the Benchmark results are now consistent.
$ go test ./ -bench=BenchmarkGenerate -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 390487 3062 ns/op
PASS
ok cegen 1.232ss
$ go test ./ -bench=BenchmarkGenerate -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 393402 3028 ns/op
PASS
ok cegen 2.215s
Maybe we can remove test one by one to see if any test in particular turns out to improve the benchmark results?
Maybe then we can see if that test has side effects like registering a callback or changing some global state without resetting it back to defaults. For example I can see there is a sync.Pool with global state involved. It might just be that the tests causes the sync.Pool to be filled and causing less allocations in the later benchmark. You can add b.ReportAllocs() before the for loop in the benchmark to see allocations.
Oh, right, global vars are _shared between all tests_. I didn't even consider that angle. Thanks for bringing it up 🤦
TestRun indeed causes the sync.Pool to be initialized with much small buffers (2 bytes vs. 2048), making the pool useless in the benchmark if not skipped:
https://gist.github.com/antoineco/19fda1e474bcc545cffe48ac07f9889e#file-main_test-go-L63-L65
$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 164232 6758 ns/op 13945 B/op 22 allocs/op
BenchmarkGenerate 177374 6586 ns/op 13945 B/op 22 allocs/op
BenchmarkGenerate 180433 6594 ns/op 13945 B/op 22 allocs/op
BenchmarkGenerate 175017 6696 ns/op 13945 B/op 22 allocs/op
BenchmarkGenerate 174841 6692 ns/op 13945 B/op 22 allocs/op
PASS
ok cegen 7.175s
$ go test ./ -bench=BenchmarkGenerate -count=5 -cpu=1 -run=XXX
goos: linux
goarch: amd64
pkg: cegen
BenchmarkGenerate 375070 3244 ns/op 192 B/op 8 allocs/op
BenchmarkGenerate 387146 3217 ns/op 192 B/op 8 allocs/op
BenchmarkGenerate 381927 3178 ns/op 192 B/op 8 allocs/op
BenchmarkGenerate 389636 3191 ns/op 192 B/op 8 allocs/op
BenchmarkGenerate 336218 3163 ns/op 192 B/op 8 allocs/op
PASS
ok cegen 6.157s
Thanks a lot for investigating. 🙌 Mystery solved!
(edit)
In case someone read this and the cause of the issue isn't obvious 👇
diff --git a/perf/tools/cegen/main.go b/perf/tools/cegen/main.go
index cf585da..1662866 100644
--- a/perf/tools/cegen/main.go
+++ b/perf/tools/cegen/main.go
@@ -139,6 +139,9 @@ type CloudEventTargetsGenerator struct {
typeAttr string
sourceAttr string
data []byte
+
+ // Once used to initialize the buffer pool on the first call to Generate.
+ bufOnce sync.Once
}
// NewCloudEventTargetsGenerator returns a generator that yields vegeta JSON
@@ -157,9 +160,6 @@ func NewCloudEventTargetsGenerator(url, typeAttr, sourceAttr string, data []byte
// Buffer pool for jwriter.Writer's underlying Buffer and output.
var writerBufPool *sync.Pool
-// Once used to initialize the buffer pool on the first call to Generate.
-var bufOnce sync.Once
-
// Generate returns a target serialized as JSON.
func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
var t jsonTarget
@@ -181,7 +181,7 @@ func (g *CloudEventTargetsGenerator) Generate() ([]byte, error) {
t.Body = g.data
// encode inside the Once fn to determine the size of buffers in sync pools
- bufOnce.Do(func() {
+ g.bufOnce.Do(func() {
var jw jwriter.Writer
t.encode(&jw)
dataBytes, _ := jw.BuildBytes()
@antoineco, FWIW, putting differently-sized buffers in a sync.Pool is almost never appropriate, and our sync.Pool examples have historically not been very clear about that. (That's #23199.)
@bcmills that's another interesting point 👍 Although in my case the issue was purely a misuse, as a side effect of me expecting the global var to be "fresh" with every test.
Most helpful comment
Maybe we can remove test one by one to see if any test in particular turns out to improve the benchmark results?
Maybe then we can see if that test has side effects like registering a callback or changing some global state without resetting it back to defaults. For example I can see there is a sync.Pool with global state involved. It might just be that the tests causes the sync.Pool to be filled and causing less allocations in the later benchmark. You can add b.ReportAllocs() before the for loop in the benchmark to see allocations.