fmt.Println("abc") allocates when it converts "abc" into an interface. It should not.
When the compiler sees a constant string being converted into an interface (usually at a call site?), it could create a static interface value to use, rather than allocating it at runtime.
That is, do the following conversion automatically:
var abc interface{} = "ABC"
func BenchmarkPrintln(b *testing.B) {
b.Run("regular", func(b *testing.B) {
for i := 0; i < b.N; i++ {
fmt.Fprintln(ioutil.Discard, "ABC")
}
})
b.Run("prealloc", func(b *testing.B) {
for i := 0; i < b.N; i++ {
fmt.Fprintln(ioutil.Discard, abc)
}
})
}
This might not be as bad for binary size as it initially appears; it might even be an improvement. The cost of the runtime conversion call is probably about as large as the two additional words of static data plus symbol overhead. (And that symbol overhead could be removed if necessary, as we did for strings.)
See also #17725 and the golang-dev thread about logging that insired this.
cc @randall77 @mdempsky @bradfitz
Oh, and for those that are curious:
BenchmarkPrintln/regular-8 20000000 86.6 ns/op 16 B/op 1 allocs/op
BenchmarkPrintln/prealloc-8 30000000 51.1 ns/op 0 B/op 0 allocs/op
The regular function is 235 bytes of code. Prealloc is 188 bytes of code.
So prealloc is probably a net binary size improvement.
Might even be worth doing this for all static data, including ints, floats, etc.
Threw together a prototype for strings only. Not ready to be mailed; lots of sinit.go needs to be updated for better handling of interface values for a complete CL, and it's a fair amount of careful work. But it was enough to get some numbers to convince myself that this will help. For the go-kit logging benchmarks:
name old time/op new time/op delta
JSONLoggerSimple-8 1.94µs ± 2% 1.91µs ± 2% -1.64% (p=0.026 n=9+9)
JSONLoggerContextual-8 2.60µs ± 1% 2.49µs ± 3% -4.07% (p=0.000 n=10+10)
Discard-8 96.8ns ± 1% 32.9ns ± 1% -65.99% (p=0.000 n=9+9)
OneWith-8 157ns ± 2% 89ns ± 2% -43.26% (p=0.000 n=9+9)
TwoWith-8 172ns ± 1% 102ns ± 2% -40.39% (p=0.000 n=9+9)
TenWith-8 288ns ± 1% 229ns ± 8% -20.36% (p=0.000 n=9+9)
LogfmtLoggerSimple-8 699ns ± 1% 630ns ± 2% -9.86% (p=0.000 n=9+9)
LogfmtLoggerContextual-8 958ns ± 1% 818ns ± 2% -14.57% (p=0.000 n=10+8)
NopLoggerSimple-8 188ns ± 1% 119ns ± 3% -36.87% (p=0.000 n=10+10)
NopLoggerContextual-8 379ns ± 1% 238ns ± 1% -37.23% (p=0.000 n=9+9)
ValueBindingTimestamp-8 575ns ± 1% 505ns ± 9% -12.13% (p=0.000 n=10+10)
ValueBindingCaller-8 897ns ± 2% 754ns ± 3% -15.86% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
JSONLoggerSimple-8 904B ± 0% 872B ± 0% -3.54% (p=0.000 n=10+10)
JSONLoggerContextual-8 1.20kB ± 0% 1.14kB ± 0% -5.33% (p=0.000 n=10+10)
Discard-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10)
OneWith-8 96.0B ± 0% 64.0B ± 0% -33.33% (p=0.000 n=10+10)
TwoWith-8 160B ± 0% 128B ± 0% -20.00% (p=0.000 n=10+10)
TenWith-8 672B ± 0% 640B ± 0% -4.76% (p=0.000 n=10+10)
LogfmtLoggerSimple-8 128B ± 0% 96B ± 0% -25.00% (p=0.000 n=10+10)
LogfmtLoggerContextual-8 304B ± 0% 240B ± 0% -21.05% (p=0.000 n=10+10)
NopLoggerSimple-8 128B ± 0% 96B ± 0% -25.00% (p=0.000 n=10+10)
NopLoggerContextual-8 304B ± 0% 240B ± 0% -21.05% (p=0.000 n=10+10)
ValueBindingTimestamp-8 159B ± 0% 127B ± 0% -20.13% (p=0.000 n=10+10)
ValueBindingCaller-8 112B ± 0% 80B ± 0% -28.57% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
JSONLoggerSimple-8 19.0 ± 0% 17.0 ± 0% -10.53% (p=0.000 n=10+10)
JSONLoggerContextual-8 25.0 ± 0% 21.0 ± 0% -16.00% (p=0.000 n=10+10)
Discard-8 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
OneWith-8 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
TwoWith-8 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
TenWith-8 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
LogfmtLoggerSimple-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.000 n=10+10)
LogfmtLoggerContextual-8 7.00 ± 0% 3.00 ± 0% -57.14% (p=0.000 n=10+10)
NopLoggerSimple-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.000 n=10+10)
NopLoggerContextual-8 7.00 ± 0% 3.00 ± 0% -57.14% (p=0.000 n=10+10)
ValueBindingTimestamp-8 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10)
ValueBindingCaller-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.000 n=10+10)
fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.
name old alloc/op new alloc/op delta
SprintfPadding-8 24.0B ± 0% 24.0B ± 0% ~ (all samples are equal)
SprintfEmpty-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
SprintfString-8 21.0B ± 0% 5.0B ± 0% -76.19% (p=0.000 n=10+10)
SprintfTruncateString-8 32.0B ± 0% 16.0B ± 0% -50.00% (p=0.000 n=10+10)
SprintfQuoteString-8 48.0B ± 0% 32.0B ± 0% -33.33% (p=0.000 n=10+10)
SprintfInt-8 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal)
SprintfIntInt-8 24.0B ± 0% 24.0B ± 0% ~ (all samples are equal)
SprintfPrefixedInt-8 72.0B ± 0% 72.0B ± 0% ~ (all samples are equal)
SprintfFloat-8 16.0B ± 0% 16.0B ± 0% ~ (all samples are equal)
SprintfComplex-8 48.0B ± 0% 48.0B ± 0% ~ (all samples are equal)
SprintfBoolean-8 8.00B ± 0% 8.00B ± 0% ~ (all samples are equal)
SprintfHexString-8 96.0B ± 0% 80.0B ± 0% -16.67% (p=0.000 n=10+10)
SprintfHexBytes-8 112B ± 0% 112B ± 0% ~ (all samples are equal)
SprintfBytes-8 96.0B ± 0% 96.0B ± 0% ~ (all samples are equal)
SprintfStringer-8 32.0B ± 0% 32.0B ± 0% ~ (all samples are equal)
SprintfStructure-8 256B ± 0% 256B ± 0% ~ (all samples are equal)
ManyArgs-8 80.0B ± 0% 48.0B ± 0% -40.00% (p=0.000 n=10+10)
FprintInt-8 8.00B ± 0% 8.00B ± 0% ~ (all samples are equal)
FprintfBytes-8 32.0B ± 0% 32.0B ± 0% ~ (all samples are equal)
FprintIntNoAlloc-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
ScanInts-8 15.2kB ± 0% 15.2kB ± 0% ~ (p=0.179 n=10+10)
ScanRecursiveInt-8 21.6kB ± 0% 21.6kB ± 0% ~ (all samples are equal)
ScanRecursiveIntReaderWrapper-8 21.7kB ± 0% 21.7kB ± 0% ~ (all samples are equal)
name old allocs/op new allocs/op delta
SprintfPadding-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfEmpty-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
SprintfString-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
SprintfTruncateString-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
SprintfQuoteString-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
SprintfInt-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfIntInt-8 3.00 ± 0% 3.00 ± 0% ~ (all samples are equal)
SprintfPrefixedInt-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfFloat-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfComplex-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfBoolean-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfHexString-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
SprintfHexBytes-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfBytes-8 2.00 ± 0% 2.00 ± 0% ~ (all samples are equal)
SprintfStringer-8 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal)
SprintfStructure-8 7.00 ± 0% 7.00 ± 0% ~ (all samples are equal)
ManyArgs-8 8.00 ± 0% 6.00 ± 0% -25.00% (p=0.000 n=10+10)
FprintInt-8 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
FprintfBytes-8 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
FprintIntNoAlloc-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
ScanInts-8 1.60k ± 0% 1.60k ± 0% ~ (all samples are equal)
ScanRecursiveInt-8 1.71k ± 0% 1.71k ± 0% ~ (all samples are equal)
ScanRecursiveIntReaderWrapper-8 1.71k ± 0% 1.71k ± 0% ~ (all samples are equal)
Found a simpler and broader (and thus higher risk) implementation. CL 35554
CL https://golang.org/cl/35554 mentions this issue.
Most helpful comment
Threw together a prototype for strings only. Not ready to be mailed; lots of sinit.go needs to be updated for better handling of interface values for a complete CL, and it's a fair amount of careful work. But it was enough to get some numbers to convince myself that this will help. For the go-kit logging benchmarks:
fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.