Go tip doesn't know that string / []byte arguments passed to functions like strings.IndexByte don't escape. This leads to unnecessary allocation and copy in the following code:
package stringsIndex_test
import (
"strings"
"testing"
)
func BenchmarkStringsIndexByte(b *testing.B) {
bb := []byte("foobar baz")
b.ReportAllocs()
for i := 0; i < b.N; i++ {
// string(bb) unnecessarily allocates new string and copies bb contents to it.
n := strings.IndexByte(string(bb), ' ')
Sink += n
}
}
var Sink int
Benchmark results indicate an unnecessary allocation:
BenchmarkStringsIndexByte 50000000 30.5 ns/op 16 B/op 1 allocs/op
Mark strings and bytes functions accepting string / []byte as go:noescape in order to eliminate unnecessary allocations in the code above.
CC @dr2chase
I don't think we want to start using go:noescape in places like this. But perhaps the compiler can figure it out.
I don't think we want to start using go:noescape in places like this. But perhaps the compiler can figure it out.
I think it will be too difficult figuring out this for functions written in assembly. Probably, it would be OK adding go:noescape for assembly-written functions?
This change may have significant positive impact on standard packages, since then escape analysis may prove that arguments to many functions written in Go that call strings.Index* and co don't escape.
Yes, sorry, it's fine to use go:noescape for functions written in assembly.
IndexByte is already marked with go:noescape:
I suspect this is just a case where the string cast could be cleverer.
I guess we could fix this, but why are you using strings.IndexByte(string(bb), ' ') instead of bytes.IndexByte(bb, ' ')?
IndexByte is already marked with go:noescape
Then the []byte -> string conversion with a memory allocation and a copy for non-escaped argument looks like a bug.
why are you using strings.IndexByte(string(bb), ' ') instead of bytes.IndexByte(bb, ' ')
This is just an artificial example :)
The strings.Index(string(b), stringPrefix) is more real-life. It may be used interchangeably with bytes.Index(b, []byte(stringPrefix)), since there is no a preferred code here.
The non-escaping string(b) still can allocate if it doesn't fit the small tmp buffer.
I think https://github.com/golang/go/issues/27148#issuecomment-422724170 can be relevant here.
bytes.Index(b, []byte(stringPrefix))
This is a preferred way for now, since prefix is usually smaller than haystack itself, and it can probably fit fixes 32-byte stack allocated buffer.
Here is a simple benchmarking code:
package foo
import (
"bytes"
"strings"
"testing"
)
var haystack = bytes.Repeat([]byte("a"), 100)
var needle = "aaa"
func BenchmarkStringsIndex(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = strings.Index(string(haystack), needle)
}
}
func BenchmarkBytesIndex(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = bytes.Index(haystack, []byte(needle))
}
}
The results:
BenchmarkStringsIndex-8 20000000 73.3 ns/op 112 B/op 1 allocs/op
BenchmarkBytesIndex-8 100000000 15.1 ns/op 0 B/op 0 allocs/op
I've written a simple check that detects such calls that can be replaced with bytes.Index to aid in optimization sessions of big projects (https://go-critic.github.io/overview#indexAlloc-ref).
Change https://golang.org/cl/146018 mentions this issue: strings: declare Index as noescape
The CL I just mailed fixes strings.Index. I think all the other methods mentioned in the issue title are already ok. Let me know if that doesn't mesh with your understanding.
Most helpful comment
I guess we could fix this, but why are you using
strings.IndexByte(string(bb), ' ')instead ofbytes.IndexByte(bb, ' ')?