go version)?Master:
go version devel +5bd88b0 Thu May 10 18:02:07 2018 +0000 linux/amd64
Running encoding/binary benchmarks I see significant slowdown (compared to 1.10):
WriteSlice1000Int32s-4 9.51µs ± 2% 12.83µs ± 2% +34.92% (p=0.008 n=5+5)
ReadSlice1000Int32s-4 9.34µs ± 2% 12.60µs ± 2% +34.90% (p=0.008 n=5+5)
This is caused by compiler not inlining autogenerated binary.(*bigEndian).PutUint32, which does nothing but shuffles arguments and calls binary.bigEndian.PutUint32.
Bisect points to ca2f85fd3f32b2a4c863a2de602876bd31e9d956, which is IMHO strange, because it doesn't enable new export format by default.
@mdempsky any ideas?
In genwrapper (cmd/compile/internal/gc/subr.go) I see:
// TODO(mdempsky): Investigate why this doesn't work with
// indexed export. For now, we disable even in non-indexed
// mode to ensure fair benchmark comparisons and to track down
// unintended compilation differences.
if false {
inlcalls(fn)
}
Should we at least change false to flagiexport, since benchmarks are done?
@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!
Additional regressions caused by that CL:
StableInt1K-4 148µs ± 0% 237µs ± 0% +59.40% (p=0.008 n=5+5)
StableInt64K-4 13.2ms ± 1% 21.0ms ± 1% +59.11% (p=0.008 n=5+5)
SortInt1K-4 134µs ± 0% 201µs ± 1% +50.75% (p=0.008 n=5+5)
SortInt64K-4 13.0ms ± 1% 19.3ms ± 1% +48.86% (p=0.008 n=5+5)
Stable1e6-4 9.76s ± 0% 13.39s ± 0% +37.26% (p=0.008 n=5+5)
Sort1e6-4 2.46s ± 0% 3.37s ± 0% +37.08% (p=0.008 n=5+5)
Sort1e4-4 16.2ms ± 0% 22.2ms ± 0% +36.50% (p=0.016 n=5+4)
Stable1e2-4 155µs ± 0% 211µs ± 0% +35.67% (p=0.008 n=5+5)
bufio
WriterFlush-4 26.8ns ± 0% 31.6ns ± 0% +18.09% (p=0.016 n=5+4)
There are more regressions, but these are the most significant ones (highest delta).
I see you are systematically checking benchmarks and running down significant regressions.
Probably we should automate this through a nightly job so that offfending CLs are caught quicker.
Change https://golang.org/cl/135697 mentions this issue: cmd/compile/internal/gc: inline autogenerated (*T).M wrappers
I believe this is fixed now. The CL above, and possibly the slightly more aggressive mid-stack inlining makes the wrapper inline successfully.
Tip is still slower than 1.10, but only by a few percent.
Most helpful comment
@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!