What steps reproduce the problem? 1. Download playground.zip 2. Run go build -gcflags=-m bytes.go 3. Run go build -gcflags=-m bytes2.go playground/bytes: Resembles a simplified version of Go's bytes.Buffer, with the exception that NewBuffer returns Buffer{} instead of &Buffer{}. playground/bytes2: Different implementation that avoids reslicing What happened? $ go build -gcflags=-m bytes.go # playground/bytes bytes/buffer.go:12: can inline NewBuffer bytes/buffer.go:57: can inline (*Buffer).Read bytes/buffer.go:69: can inline (*Buffer).Bytes bytes/buffer.go:12: leaking param: b to result ~anon1 bytes/buffer.go:16: leaking param: b bytes/buffer.go:16: leaking param: b bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap bytes/buffer.go:16: leaking param: b bytes/buffer.go:44: leaking param: b bytes/buffer.go:44: leaking param: b bytes/buffer.go:52: leaking param: b bytes/buffer.go:52: (*Buffer).Write p does not escape bytes/buffer.go:57: (*Buffer).Read b does not escape bytes/buffer.go:57: (*Buffer).Read p does not escape bytes/buffer.go:69: (*Buffer).Bytes b does not escape # command-line-arguments ./bytes.go:9: inlining call to bytes.NewBuffer ./bytes.go:13: inlining call to Read ./bytes.go:18: inlining call to Bytes ./bytes.go:9: moved to heap: myBuf ./bytes.go:11: myBuf escapes to heap ./bytes.go:8: make([]byte, 4) escapes to heap ./bytes.go:14: myBuf escapes to heap ./bytes.go:8: make([]byte, 4) escapes to heap ./bytes.go:11: main []byte literal does not escape ./bytes.go:12: main make([]byte, 2) does not escape ./bytes.go:13: main myBuf does not escape ./bytes.go:14: main []byte literal does not escape ./bytes.go:18: main myBuf does not escape $ go build -gcflags=-m bytes2.go # playground/bytes2 bytes2/buffer.go:13: can inline NewBuffer bytes2/buffer.go:61: can inline (*Buffer).Read bytes2/buffer.go:73: can inline (*Buffer).Bytes bytes2/buffer.go:13: leaking param: b to result ~anon1 bytes2/buffer.go:38: make([]byte, size) escapes to heap bytes2/buffer.go:17: (*Buffer).grow b does not escape bytes2/buffer.go:47: (*Buffer).Grow b does not escape bytes2/buffer.go:54: (*Buffer).Write b does not escape bytes2/buffer.go:54: (*Buffer).Write p does not escape bytes2/buffer.go:61: (*Buffer).Read b does not escape bytes2/buffer.go:61: (*Buffer).Read p does not escape bytes2/buffer.go:73: (*Buffer).Bytes b does not escape # command-line-arguments ./bytes2.go:9: inlining call to bytes2.NewBuffer ./bytes2.go:13: inlining call to Read ./bytes2.go:18: inlining call to Bytes ./bytes2.go:8: main make([]byte, 4) does not escape ./bytes2.go:11: main myBuf does not escape ./bytes2.go:11: main []byte literal does not escape ./bytes2.go:12: main make([]byte, 2) does not escape ./bytes2.go:13: main myBuf does not escape ./bytes2.go:14: main myBuf does not escape ./bytes2.go:14: main []byte literal does not escape ./bytes2.go:18: main myBuf does not escape # command-line-arguments What should have happened instead? This shouldn't be happening with playground/bytes: ./bytes.go:9: moved to heap: myBuf ./bytes.go:11: myBuf escapes to heap ./bytes.go:8: make([]byte, 4) escapes to heap ./bytes.go:14: myBuf escapes to heap ./bytes.go:8: make([]byte, 4) escapes to heap A re-slicing shouldn't cause playground/bytes's Buffer to always be heap allocated. It should be like playground/bytes2, which avoids heap allocation until a resize happens: bytes2/buffer.go:38: make([]byte, size) escapes to heap Please provide any additional information below. If this is working as intended, the implementation of bytes.Buffer should change to allow it to be completely stack allocated. The playground/bytes2 implementation is completely stack allocated until the initial buffer needs to be resized. This allows it to operate on the stack, avoiding a heap allocation if possible.
Attachments:
Another thing to note: If I change "NewBuffer" in playground/bytes2 to return &Buffer{} then buf (in main()) escapes to the heap even though NewBuffer is inlined: ./bytes2.go:9: inlining call to bytes2.NewBuffer ./bytes2.go:13: inlining call to Read ./bytes2.go:18: inlining call to Bytes ./bytes2.go:8: make([]byte, 4) escapes to heap ./bytes2.go:9: <S> &bytes2.Buffer literal does not escape ./bytes2.go:11: main []byte literal does not escape ./bytes2.go:12: main make([]byte, 2) does not escape ./bytes2.go:14: main []byte literal does not escape Otherwise if I return Buffer{} directly nothing gets allocated to the heap, unless a resize happens.
With Go 1.8, the provided sample code (the playground/bytes package) no longer escapes:
GOPATH=/tmp/gopath go build -gcflags -m bytes.go
# playground/bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: can inline NewBuffer
/tmp/gopath/src/playground/bytes/buffer.go:57: can inline (*Buffer).Read
/tmp/gopath/src/playground/bytes/buffer.go:69: can inline (*Buffer).Bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: leaking param: b to result ~r1 level=0
/tmp/gopath/src/playground/bytes/buffer.go:21: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:40: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:16: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap
/tmp/gopath/src/playground/bytes/buffer.go:49: (*Buffer).Grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:44: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: p
/tmp/gopath/src/playground/bytes/buffer.go:57: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:57: (*Buffer).Read p does not escape
/tmp/gopath/src/playground/bytes/buffer.go:69: leaking param: b to result ~r0 level=1
# command-line-arguments
./bytes.go:9: inlining call to bytes.NewBuffer
./bytes.go:13: inlining call to (*bytes.Buffer).Read
./bytes.go:18: inlining call to (*bytes.Buffer).Bytes
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:11: main myBuf does not escape
./bytes.go:11: main ([]byte)("test") does not escape
./bytes.go:12: main make([]byte, 2) does not escape
./bytes.go:13: main myBuf does not escape
./bytes.go:14: main myBuf does not escape
./bytes.go:14: main ([]byte)("00") does not escape
./bytes.go:16: main string(buf) does not escape
./bytes.go:18: main myBuf does not escape
./bytes.go:18: main string(([]byte)(~r0)) does not escape
However, a bytes.Buffer still seems to always escape if you call its WriteXxx methods, so the root issue persists.
It's the use of the bootstrap array that's forcing bytes.Buffer to always escape now, from what I can tell. Here's a very simple repro:
package main
type B struct {
buf []byte
storage [64]byte
}
func (b *B) X() {
b.buf = b.storage[:10]
}
func main() {
var b B
b.X()
}
./test.go:8: can inline (*B).X
./test.go:12: can inline main
./test.go:14: inlining call to (*B).X
./test.go:9: b.storage escapes to heap
./test.go:8: leaking param: b
./test.go:14: b.storage escapes to heap
./test.go:14: b escapes to heap
./test.go:13: moved to heap: b
Is it the case that any self-referencing pointers foil escape analysis? For instance, it also escapes if you do this:
type B struct {
p *int
n int
}
func (b *B) X() {
b.p = &b.n
}
cc @randall77 for escape analysis thoughts
@lukescott I took the liberty of retitling the issue given that the re-slicing thing was fixed but the underlying problem of bytes.Buffer always being heap-allocated was not (you also discussed this in the now-closed #7661).
I don't know why this would cause an escape. Probably a tricky corner case? Cycles in the escapes-to graph?
@drchase might know.
I don't "know", but I suspect. Pretty sure that forms a cycle in the graph, which can look like an escape for various reasons.
Change https://golang.org/cl/86976 mentions this issue: strings: prevent copyCheck from forcing Builder to escape and allocate
Change https://golang.org/cl/133375 mentions this issue: cmd/compile/internal/gc: handle array slice self-assign in esc.go
It's hard to solve this issue without any API changes just with improvements to the escape analysis.
Although https://golang.org/cl/133375 does make it possible to implement "proper" bytes.Buffer
outside of golang stdlib.
See https://github.com/intel-go/bytebuf.
The only change to buffer.go
:
type Buffer struct {
buf []byte // contents are the bytes buf[off : len(buf)]
off int // read at &buf[off], write at &buf[len(buf)]
- bootstrap [64]byte // memory to hold first slice; helps small buffers avoid allocation.
+ bootstrap *[64]byte // memory to hold first slice; helps small buffers avoid allocation.
lastRead readOp // last read operation, so that Unread* can work correctly.
}
And we get these numbers:
name old time/op new time/op delta
String/empty-8 138ns ±13% 24ns ± 0% -82.94% (p=0.000 n=10+8)
String/5-8 186ns ±11% 60ns ± 1% -67.82% (p=0.000 n=10+10)
String/64-8 225ns ±10% 108ns ± 6% -52.26% (p=0.000 n=10+10)
String/128-8 474ns ±17% 338ns ±13% -28.57% (p=0.000 n=10+10)
String/1024-8 889ns ± 0% 740ns ± 1% -16.78% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
String/empty-8 112B ± 0% 0B -100.00% (p=0.000 n=10+10)
String/5-8 117B ± 0% 5B ± 0% -95.73% (p=0.000 n=10+10)
String/64-8 176B ± 0% 64B ± 0% -63.64% (p=0.000 n=10+10)
String/128-8 368B ± 0% 256B ± 0% -30.43% (p=0.000 n=10+10)
String/1024-8 2.16kB ± 0% 2.05kB ± 0% -5.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
String/empty-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
String/5-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
String/64-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
String/128-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
String/1024-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
@Quasilyte there's something I don't understand. The theory behind the bootstrap array was to avoid an allocation, by having the initial slice point to it. Unfortunately, our escape analysis is unable to prove that the buffer does not escape (would you be able to explain why, what is the limit?), so we always got 1 allocation (the bytes.Buffer
instance).
Your change makes the buffer to be a pointer to an array, so that (bytes.Buffer
) does not escape, but it forces an allocation for the bootstrap buffer.
So shouldn't it be the same?
@rasky trick is that bootstrap
array allocated with new
may be proven as non-escaping with the current escape analysis. So, it does achieve the initial goal of avoiding heap allocations for local buffers that may benefit from small buffer optimization.
Without applying CL133375 and with this patch to buffer.go
(keeping zero Buffer value usable), I get these numbers in the bytebuf
benchmarks:
name old time/op new time/op delta
String/bytes.Buffer/empty-4 64.1ns ± 0% 10.9ns ± 1% -83.07% (p=0.000 n=8+9)
String/bytes.Buffer/5-4 89.3ns ± 2% 66.5ns ± 1% -25.52% (p=0.000 n=10+10)
String/bytes.Buffer/64-4 103ns ± 0% 83ns ± 1% -19.78% (p=0.000 n=10+10)
String/bytes.Buffer/128-4 235ns ± 1% 175ns ± 2% -25.57% (p=0.000 n=9+10)
String/bytes.Buffer/1024-4 591ns ± 1% 547ns ± 2% -7.56% (p=0.000 n=8+9)
name old alloc/op new alloc/op delta
String/bytes.Buffer/empty-4 112B ± 0% 0B -100.00% (p=0.000 n=10+10)
String/bytes.Buffer/5-4 117B ± 0% 69B ± 0% -41.03% (p=0.000 n=10+10)
String/bytes.Buffer/64-4 176B ± 0% 128B ± 0% -27.27% (p=0.000 n=10+10)
String/bytes.Buffer/128-4 368B ± 0% 256B ± 0% -30.43% (p=0.000 n=10+10)
String/bytes.Buffer/1024-4 2.16kB ± 0% 2.05kB ± 0% -5.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
String/bytes.Buffer/empty-4 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
String/bytes.Buffer/5-4 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/bytes.Buffer/64-4 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/bytes.Buffer/128-4 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
String/bytes.Buffer/1024-4 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
The bytes package benchmarks aren't so rosy, though.
Since this seems to be a big improvement for certain scenarios, but requires an API change, maybe this change would be a candidate for Go 2.
@reezer, please, have some more patience.
What @opennota described is almost an optimal thing for go1.
I'll be back soon with explanations and, probably, a CL.
Change https://golang.org/cl/133715 mentions this issue: bytes: remove bootstrap array from Buffer
@opennota what you're doing is essentially a make([]byte)
, but in a slightly more convoluted way.
We can remove bootstrap
field completely and replace it with make
.
The main benefit from not having a bootstrap
(or having it as a pointer) is that b
receiver
stops leaking from Buffer.grow
method. Since Buffer.grow
can't be inlined, it's important
to not have receiver leaking in order to make stack allocated buffer possible.
It also opens some new optimization possibilities described below.
Results for benchmarks from bytebuf
package (comparing only old and new bytes.Buffer
):
name old time/op new time/op delta
String/empty-8 132ns ±14% 21ns ± 0% -84.30% (p=0.000 n=10+10)
String/5-8 190ns ±12% 130ns ± 5% -31.62% (p=0.000 n=10+9)
String/64-8 205ns ±15% 166ns ±13% -18.79% (p=0.000 n=10+10)
String/1024-8 831ns ± 1% 780ns ± 0% -6.09% (p=0.000 n=8+7)
name old alloc/op new alloc/op delta
String/empty-8 112B ± 0% 0B -100.00% (p=0.000 n=10+10)
String/5-8 117B ± 0% 69B ± 0% -41.03% (p=0.000 n=10+10)
String/64-8 176B ± 0% 128B ± 0% -27.27% (p=0.000 n=10+10)
String/1024-8 2.16kB ± 0% 2.05kB ± 0% -5.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
String/empty-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
String/5-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/64-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/1024-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
With array self-assignment removed, we don't have a leaking param in bytes.Buffer.grow
.
This makes it possible to do a fancy pre-allocation which will reside on stack for buffers that do not escape:
- var buf bytes.Buffer
+ buf := bytes.NewBuffer(make([]byte, 0, 64))
Note that one can use different capacity, so we can effectively have bootstrap "array" of more than 64 bytes.
And given that we're not leaking anymore, it really does improve things:
name old time/op new time/op delta
String/empty-8 132ns ±14% 26ns ± 0% -80.47% (p=0.000 n=10+8)
String/5-8 190ns ±12% 53ns ± 1% -71.95% (p=0.000 n=10+10)
String/64-8 205ns ±15% 99ns ± 9% -51.62% (p=0.000 n=10+10)
String/1024-8 831ns ± 1% 931ns ±20% ~ (p=0.500 n=8+10)
name old alloc/op new alloc/op delta
String/empty-8 112B ± 0% 0B -100.00% (p=0.000 n=10+10)
String/5-8 117B ± 0% 5B ± 0% -95.73% (p=0.000 n=10+10)
String/64-8 176B ± 0% 64B ± 0% -63.64% (p=0.000 n=10+10)
String/1024-8 2.16kB ± 0% 2.18kB ± 0% +0.74% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
String/empty-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
String/5-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
String/64-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
String/1024-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
Note that we do less allocations for workloads that fit the slice capacity.
Key points of the new implementation:
bytes.Buffer
performs better than beforeunsafe.Sizeof(bytes.Buffer{})
is reduced significantlyThere were no big reasons to use bytes.NewBuffer
before. Now there are.
bytes.Buffer
benchmark results (go test -v -count=10 -bench='ReadString|WriteByte|WriteRune|Buffer' -benchmem bytes
):
name old time/op new time/op delta
ReadString-8 9.20µs ± 1% 9.22µs ± 1% ~ (p=0.148 n=10+10)
WriteByte-8 28.1µs ± 0% 26.2µs ± 0% -6.78% (p=0.000 n=10+10)
WriteRune-8 64.9µs ± 0% 65.0µs ± 0% +0.16% (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8 469µs ± 0% 461µs ± 0% -1.76% (p=0.000 n=9+10)
BufferFullSmallReads-8 108µs ± 0% 108µs ± 0% -0.21% (p=0.000 n=10+10)
name old speed new speed delta
ReadString-8 3.56GB/s ± 1% 3.55GB/s ± 1% ~ (p=0.165 n=10+10)
WriteByte-8 146MB/s ± 0% 156MB/s ± 0% +7.26% (p=0.000 n=9+10)
WriteRune-8 189MB/s ± 0% 189MB/s ± 0% -0.16% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
ReadString-8 32.8kB ± 0% 32.8kB ± 0% ~ (all equal)
WriteByte-8 0.00B 0.00B ~ (all equal)
WriteRune-8 0.00B 0.00B ~ (all equal)
BufferNotEmptyWriteRead-8 4.72kB ± 0% 4.67kB ± 0% -1.02% (p=0.000 n=10+10)
BufferFullSmallReads-8 3.44kB ± 0% 3.33kB ± 0% -3.26% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
ReadString-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
WriteByte-8 0.00 0.00 ~ (all equal)
WriteRune-8 0.00 0.00 ~ (all equal)
BufferNotEmptyWriteRead-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
BufferFullSmallReads-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
I had retitled this bug to be about bytes.Buffer's bootstrap array, so now that we got rid of that I suppose we can close this bug.
The underlying issue of self-referential structures tricking escape analysis (as demonstrated in https://github.com/golang/go/issues/7921#issuecomment-285855447) remains, however.
There's a lot of history on the thread, but skimming briefly I understand the issue boils down to @cespare's example:
type B struct {
p *int
n int
}
func (b *B) X() {
b.p = &b.n
}
This function is analyzed as "leaking param: b", which means calling b.X()
causes b
to always be heap allocated.
The issue here is that we're assigning &b.n through a (implicit) pointer dereference, so escape analysis pessimistically assumes the pointer might point to the heap.
There's an optimization esc.go:isSelfAssign that tries to look for things like p.f = p.g
to eliminate unnecessary leaks, but it doesn't recognize p.f = &p.g
. It probably could though.
(The above explanation applies to both esc.go and escape.go; they use the same approach here and even both use isSelfAssign for this optimization.)
Edit: This is wrong. Simply recognizing p.f = &p.g
in isSelfAssign is insufficient, because we need to still record that this assignment makes p
self-referential.
@mdempsky If we eliminate unnecessary leaks in p.f = &p.g
, then this code:
package escape
type S struct {
i int
pi *int
}
var sink S
func f(p *S) {
p.pi = &p.i
sink = *p
}
func g(p *S) {
p.pi = &p.i
}
func h() {
var s S
g(&s)
sink = s
}
will only has leaking param content in func f(p *S) {
right? Will var s S
moves to heap?
@cuonglm Unfortunately, my earlier suggestion to ignore p.f = &p.g
like how we ignore p.f = p.g
is wrong.
In the example you pasted, it's important that var s S
be heap allocated: after the assignment sink = s
, we have sink.pi
pointing to s.i
. If we naively treated g
as a non-escaping function though, s
would be stack allocated.
To fix this, we need a way to tag functions with something like "*p
becomes self-referential". The current scheme used by esc.go (and inherited by escape.go) doesn't support this though.
Once we remove esc.go, I expect we'll be able to cleanup/simplify the tagging scheme, and there will probably be opportunities for representing more fine-grained semantics like this.
I think I may case the same problem:
https://github.com/golang/go/issues/34463#issuecomment-535749969
https://github.com/golang/go/issues/34463#issuecomment-536232425
The memory is leaking
@mdempsky now that esc.go is gone, is this ripe for revisiting?
Most helpful comment
It's hard to solve this issue without any API changes just with improvements to the escape analysis.
Although https://golang.org/cl/133375 does make it possible to implement "proper"
bytes.Buffer
outside of golang stdlib.See https://github.com/intel-go/bytebuf.
The only change to
buffer.go
:And we get these numbers: