Go: cmd/compile: don't allocate when putting constant strings in an interface

Created on 18 Jan 2017 · 6Comments · Source: golang/go

fmt.Println("abc") allocates when it converts "abc" into an interface. It should not.

When the compiler sees a constant string being converted into an interface (usually at a call site?), it could create a static interface value to use, rather than allocating it at runtime.

That is, do the following conversion automatically:

var abc interface{} = "ABC"

func BenchmarkPrintln(b *testing.B) {
    b.Run("regular", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            fmt.Fprintln(ioutil.Discard, "ABC")
        }
    })
    b.Run("prealloc", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            fmt.Fprintln(ioutil.Discard, abc)
        }
    })
}

This might not be as bad for binary size as it initially appears; it might even be an improvement. The cost of the runtime conversion call is probably about as large as the two additional words of static data plus symbol overhead. (And that symbol overhead could be removed if necessary, as we did for strings.)

See also #17725 and the golang-dev thread about logging that insired this.

cc @randall77 @mdempsky @bradfitz

FrozenDueToAge Performance

Source

josharian

👍5

Most helpful comment

Threw together a prototype for strings only. Not ready to be mailed; lots of sinit.go needs to be updated for better handling of interface values for a complete CL, and it's a fair amount of careful work. But it was enough to get some numbers to convince myself that this will help. For the go-kit logging benchmarks:

name                      old time/op    new time/op    delta
JSONLoggerSimple-8          1.94µs ± 2%    1.91µs ± 2%   -1.64%    (p=0.026 n=9+9)
JSONLoggerContextual-8      2.60µs ± 1%    2.49µs ± 3%   -4.07%  (p=0.000 n=10+10)
Discard-8                   96.8ns ± 1%    32.9ns ± 1%  -65.99%    (p=0.000 n=9+9)
OneWith-8                    157ns ± 2%      89ns ± 2%  -43.26%    (p=0.000 n=9+9)
TwoWith-8                    172ns ± 1%     102ns ± 2%  -40.39%    (p=0.000 n=9+9)
TenWith-8                    288ns ± 1%     229ns ± 8%  -20.36%    (p=0.000 n=9+9)
LogfmtLoggerSimple-8         699ns ± 1%     630ns ± 2%   -9.86%    (p=0.000 n=9+9)
LogfmtLoggerContextual-8     958ns ± 1%     818ns ± 2%  -14.57%   (p=0.000 n=10+8)
NopLoggerSimple-8            188ns ± 1%     119ns ± 3%  -36.87%  (p=0.000 n=10+10)
NopLoggerContextual-8        379ns ± 1%     238ns ± 1%  -37.23%    (p=0.000 n=9+9)
ValueBindingTimestamp-8      575ns ± 1%     505ns ± 9%  -12.13%  (p=0.000 n=10+10)
ValueBindingCaller-8         897ns ± 2%     754ns ± 3%  -15.86%   (p=0.000 n=10+9)

name                      old alloc/op   new alloc/op   delta
JSONLoggerSimple-8            904B ± 0%      872B ± 0%   -3.54%  (p=0.000 n=10+10)
JSONLoggerContextual-8      1.20kB ± 0%    1.14kB ± 0%   -5.33%  (p=0.000 n=10+10)
Discard-8                    64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=10+10)
OneWith-8                    96.0B ± 0%     64.0B ± 0%  -33.33%  (p=0.000 n=10+10)
TwoWith-8                     160B ± 0%      128B ± 0%  -20.00%  (p=0.000 n=10+10)
TenWith-8                     672B ± 0%      640B ± 0%   -4.76%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
NopLoggerSimple-8             128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       159B ± 0%      127B ± 0%  -20.13%  (p=0.000 n=10+10)
ValueBindingCaller-8          112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
JSONLoggerSimple-8            19.0 ± 0%      17.0 ± 0%  -10.53%  (p=0.000 n=10+10)
JSONLoggerContextual-8        25.0 ± 0%      21.0 ± 0%  -16.00%  (p=0.000 n=10+10)
Discard-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
OneWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TwoWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TenWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
NopLoggerSimple-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)
ValueBindingCaller-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)

fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.

name                             old alloc/op   new alloc/op   delta
SprintfPadding-8                    24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfEmpty-8                     0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
SprintfString-8                     21.0B ± 0%      5.0B ± 0%  -76.19%        (p=0.000 n=10+10)
SprintfTruncateString-8             32.0B ± 0%     16.0B ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                48.0B ± 0%     32.0B ± 0%  -33.33%        (p=0.000 n=10+10)
SprintfInt-8                        16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfIntInt-8                     24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                72.0B ± 0%     72.0B ± 0%     ~     (all samples are equal)
SprintfFloat-8                      16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfComplex-8                    48.0B ± 0%     48.0B ± 0%     ~     (all samples are equal)
SprintfBoolean-8                    8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
SprintfHexString-8                  96.0B ± 0%     80.0B ± 0%  -16.67%        (p=0.000 n=10+10)
SprintfHexBytes-8                    112B ± 0%      112B ± 0%     ~     (all samples are equal)
SprintfBytes-8                      96.0B ± 0%     96.0B ± 0%     ~     (all samples are equal)
SprintfStringer-8                   32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
SprintfStructure-8                   256B ± 0%      256B ± 0%     ~     (all samples are equal)
ManyArgs-8                          80.0B ± 0%     48.0B ± 0%  -40.00%        (p=0.000 n=10+10)
FprintInt-8                         8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
FprintfBytes-8                      32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ScanInts-8                         15.2kB ± 0%    15.2kB ± 0%     ~           (p=0.179 n=10+10)
ScanRecursiveInt-8                 21.6kB ± 0%    21.6kB ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8    21.7kB ± 0%    21.7kB ± 0%     ~     (all samples are equal)

name                             old allocs/op  new allocs/op  delta
SprintfPadding-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfEmpty-8                      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
SprintfString-8                      2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfTruncateString-8              2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                 2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfInt-8                         2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfIntInt-8                      3.00 ± 0%      3.00 ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                 2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfFloat-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfComplex-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBoolean-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfHexString-8                   2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfHexBytes-8                    2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBytes-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfStringer-8                    4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
SprintfStructure-8                   7.00 ± 0%      7.00 ± 0%     ~     (all samples are equal)
ManyArgs-8                           8.00 ± 0%      6.00 ± 0%  -25.00%        (p=0.000 n=10+10)
FprintInt-8                          1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintfBytes-8                       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ScanInts-8                          1.60k ± 0%     1.60k ± 0%     ~     (all samples are equal)
ScanRecursiveInt-8                  1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8     1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)

josharian on 21 Jan 2017

👍4

All 6 comments

Oh, and for those that are curious:

BenchmarkPrintln/regular-8          20000000            86.6 ns/op        16 B/op          1 allocs/op
BenchmarkPrintln/prealloc-8         30000000            51.1 ns/op         0 B/op          0 allocs/op

josharian on 18 Jan 2017

The regular function is 235 bytes of code. Prealloc is 188 bytes of code.
So prealloc is probably a net binary size improvement.

josharian on 18 Jan 2017

Might even be worth doing this for all static data, including ints, floats, etc.

josharian on 18 Jan 2017

name                      old time/op    new time/op    delta
JSONLoggerSimple-8          1.94µs ± 2%    1.91µs ± 2%   -1.64%    (p=0.026 n=9+9)
JSONLoggerContextual-8      2.60µs ± 1%    2.49µs ± 3%   -4.07%  (p=0.000 n=10+10)
Discard-8                   96.8ns ± 1%    32.9ns ± 1%  -65.99%    (p=0.000 n=9+9)
OneWith-8                    157ns ± 2%      89ns ± 2%  -43.26%    (p=0.000 n=9+9)
TwoWith-8                    172ns ± 1%     102ns ± 2%  -40.39%    (p=0.000 n=9+9)
TenWith-8                    288ns ± 1%     229ns ± 8%  -20.36%    (p=0.000 n=9+9)
LogfmtLoggerSimple-8         699ns ± 1%     630ns ± 2%   -9.86%    (p=0.000 n=9+9)
LogfmtLoggerContextual-8     958ns ± 1%     818ns ± 2%  -14.57%   (p=0.000 n=10+8)
NopLoggerSimple-8            188ns ± 1%     119ns ± 3%  -36.87%  (p=0.000 n=10+10)
NopLoggerContextual-8        379ns ± 1%     238ns ± 1%  -37.23%    (p=0.000 n=9+9)
ValueBindingTimestamp-8      575ns ± 1%     505ns ± 9%  -12.13%  (p=0.000 n=10+10)
ValueBindingCaller-8         897ns ± 2%     754ns ± 3%  -15.86%   (p=0.000 n=10+9)

name                      old alloc/op   new alloc/op   delta
JSONLoggerSimple-8            904B ± 0%      872B ± 0%   -3.54%  (p=0.000 n=10+10)
JSONLoggerContextual-8      1.20kB ± 0%    1.14kB ± 0%   -5.33%  (p=0.000 n=10+10)
Discard-8                    64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=10+10)
OneWith-8                    96.0B ± 0%     64.0B ± 0%  -33.33%  (p=0.000 n=10+10)
TwoWith-8                     160B ± 0%      128B ± 0%  -20.00%  (p=0.000 n=10+10)
TenWith-8                     672B ± 0%      640B ± 0%   -4.76%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
NopLoggerSimple-8             128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       159B ± 0%      127B ± 0%  -20.13%  (p=0.000 n=10+10)
ValueBindingCaller-8          112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
JSONLoggerSimple-8            19.0 ± 0%      17.0 ± 0%  -10.53%  (p=0.000 n=10+10)
JSONLoggerContextual-8        25.0 ± 0%      21.0 ± 0%  -16.00%  (p=0.000 n=10+10)
Discard-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
OneWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TwoWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TenWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
NopLoggerSimple-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)
ValueBindingCaller-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)

fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.

name                             old alloc/op   new alloc/op   delta
SprintfPadding-8                    24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfEmpty-8                     0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
SprintfString-8                     21.0B ± 0%      5.0B ± 0%  -76.19%        (p=0.000 n=10+10)
SprintfTruncateString-8             32.0B ± 0%     16.0B ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                48.0B ± 0%     32.0B ± 0%  -33.33%        (p=0.000 n=10+10)
SprintfInt-8                        16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfIntInt-8                     24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                72.0B ± 0%     72.0B ± 0%     ~     (all samples are equal)
SprintfFloat-8                      16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfComplex-8                    48.0B ± 0%     48.0B ± 0%     ~     (all samples are equal)
SprintfBoolean-8                    8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
SprintfHexString-8                  96.0B ± 0%     80.0B ± 0%  -16.67%        (p=0.000 n=10+10)
SprintfHexBytes-8                    112B ± 0%      112B ± 0%     ~     (all samples are equal)
SprintfBytes-8                      96.0B ± 0%     96.0B ± 0%     ~     (all samples are equal)
SprintfStringer-8                   32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
SprintfStructure-8                   256B ± 0%      256B ± 0%     ~     (all samples are equal)
ManyArgs-8                          80.0B ± 0%     48.0B ± 0%  -40.00%        (p=0.000 n=10+10)
FprintInt-8                         8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
FprintfBytes-8                      32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ScanInts-8                         15.2kB ± 0%    15.2kB ± 0%     ~           (p=0.179 n=10+10)
ScanRecursiveInt-8                 21.6kB ± 0%    21.6kB ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8    21.7kB ± 0%    21.7kB ± 0%     ~     (all samples are equal)

name                             old allocs/op  new allocs/op  delta
SprintfPadding-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfEmpty-8                      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
SprintfString-8                      2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfTruncateString-8              2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                 2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfInt-8                         2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfIntInt-8                      3.00 ± 0%      3.00 ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                 2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfFloat-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfComplex-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBoolean-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfHexString-8                   2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfHexBytes-8                    2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBytes-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfStringer-8                    4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
SprintfStructure-8                   7.00 ± 0%      7.00 ± 0%     ~     (all samples are equal)
ManyArgs-8                           8.00 ± 0%      6.00 ± 0%  -25.00%        (p=0.000 n=10+10)
FprintInt-8                          1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintfBytes-8                       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ScanInts-8                          1.60k ± 0%     1.60k ± 0%     ~     (all samples are equal)
ScanRecursiveInt-8                  1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8     1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)

josharian on 21 Jan 2017

👍4

Found a simpler and broader (and thus higher risk) implementation. CL 35554

josharian on 22 Jan 2017

CL https://golang.org/cl/35554 mentions this issue.