Two little highly-optimised functions suffer from serious performance when compared to Go version 1.9.
name old time/op new time/op delta
PutUint64-12 4.09ns 卤 0% 5.79ns 卤 0% +41.62% (p=0.000 n=20+19)
Uint64-12 4.24ns 卤 0% 6.54ns 卤 0% +54.35% (p=0.000 n=20+19)
The assembler output from go build -gcflags=-S is exactly the same so I don't know where to look for. If anybody can give me some pointers I'd be happy to build a clean test case.
Here follows a snippet from https://github.com/pascaldekloe/flit for context.
// PutUint64 encodes an integer into buf and returns the serial size.
// If the buffer is smaller than 9 bytes, PutUint64 may panic.
func PutUint64(buf []byte, v uint64) (n int) {
if v < 128 {
buf[0] = uint8(v)<<1 | 1
return 1
}
if v >= uint64(1) << 56 {
buf[0] = 0
binary.LittleEndian.PutUint64(buf[1:], v)
return 9
}
lz := bits.LeadingZeros64(v)
// extra bytes = (bits - 1) / 7 = (63 - lz) / 7
e := ((63 - lz) * 2454267027) >> 34
v <<= 1
v |= 1
v <<= uint(e) & 63
binary.LittleEndian.PutUint64(buf, v)
return e + 1
}
The assembler output from
go build -gcflags=-Sis exactly the same so I don't know where to look for.
Last time I saw a performance issue like that, it turned out that the performance of the CPU was sensitive to the alignment of the function in memory.
That said, this function seems small enough that it is likely to be inlined. Do you have a larger benchmark that shows the same regressions, or do they only occur in microbenchmarks? (That is, what is the impact of the regression on the overall program, rather than just the benchmark?)
git diff go1.9..go1.10 -- src/encoding/binary shows no changes to the implementation, so this is likely a compiler issue if there is an issue at all.
Perhaps the function used to be inlined, but isn't being inlined anymore?
The assembler output from go build -gcflags=-S is exactly the same
This is not completely true. The 1.10 function is slightly larger.
1.9:
"".PutUint64 STEXT nosplit size=269 args=0x28 locals=0x8
1.10 (note size=):
"".PutUint64 STEXT nosplit size=276 args=0x28 locals=0x8
the reason the 1.10 is larger is that it has one more panicindex
0x010b 00267 (prova.go:14) UNDEF
0x010d 00269 (prova.go:10) PCDATA $0, $1
0x010d 00269 (prova.go:10) CALL runtime.panicindex(SB)
at the bottom. 1.9 has three, 1.10 has four.
Spot on @mvdan! Go 1.10 does no longer inline these functions into the Benchmark functions.
I was just about to apply this function and for as far as I know there are no users yet @bcmills. I care not so much for this specific case yet this could have been something not visible for most cases.
The panic case is not benchmarked @ALTree.
The help is much appreciated. Thanks a bunch.
I'm confused - are you closing this because Go 1.10 no longer inlines calls made in benchmark functions by design?
The panic case is not benchmarked
I know that. But the function size sometime matters in microbenchmarks.
Also why did you close this? Is the fact that the function is no longer inlined expected?
Weather the compiler inlines a function or not can hardly be a bug right? And this might be triggered by the slightly larger size indeed. I was worried that some essential was seriously off.
I would expect that, given that PutUint64 hasn't changed between 1.9 and 1.10, the compiled code should be the same or better - not worse. Unless I'm missing something obvious?
One way or another, no harm in leaving this open in case there is actually a compiler regression somewhere.
Weather the compiler inlines a function or not can hardly be a bug right?
It absolutely can. We are definitely interested in understanding why your function is no longer inlined in 1.10.
1.10 "-m -m" says
cannot inline PutUint64: function too complex: cost 216 exceeds budget 80
while 1.9 inlines it without complaining.
/cc @mdempsky @josharian @TocarIP
I believe this is caused by the fix for #19261.
1.9 inlines the function because it incorrectly computed a cost of 0 for cross-package calls like binary.LittleEndian.PutUint64. This was fixed for 1.10 in CL 70151, and since the cost of the binary methods calls is now computed correctly, OP's function is much more heavier and it's no longer marked as inlineable.
So I believe this is WAI.
I think https://go-review.googlesource.com/c/go/+/95475 is relevant here
Edit:
It fixes PutUint32/16 degradation and not 64, so it isn't relevant here.
Closing this, since it is expected that the function is no longer inlined (i linked the fix that caused this above).