Go: cmd/compile: teach the compiler that arguments passed to {strings|bytes}.{Index*|Count|Has*} don't escape

Created on 13 Jun 2018  ·  11Comments  ·  Source: golang/go

The issue

Go tip doesn't know that string / []byte arguments passed to functions like strings.IndexByte don't escape. This leads to unnecessary allocation and copy in the following code:

package stringsIndex_test

import (
        "strings"
        "testing"
)


func BenchmarkStringsIndexByte(b *testing.B) {
        bb := []byte("foobar baz")
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                // string(bb) unnecessarily allocates new string and copies bb contents to it.
                n := strings.IndexByte(string(bb), ' ')
                Sink += n
        }
}

var Sink int

Benchmark results indicate an unnecessary allocation:

BenchmarkStringsIndexByte   50000000            30.5 ns/op        16 B/op          1 allocs/op

The solution

Mark strings and bytes functions accepting string / []byte as go:noescape in order to eliminate unnecessary allocations in the code above.

FrozenDueToAge NeedsInvestigation

Most helpful comment

I guess we could fix this, but why are you using strings.IndexByte(string(bb), ' ') instead of bytes.IndexByte(bb, ' ')?

All 11 comments

CC @dr2chase

I don't think we want to start using go:noescape in places like this. But perhaps the compiler can figure it out.

I don't think we want to start using go:noescape in places like this. But perhaps the compiler can figure it out.

I think it will be too difficult figuring out this for functions written in assembly. Probably, it would be OK adding go:noescape for assembly-written functions?

This change may have significant positive impact on standard packages, since then escape analysis may prove that arguments to many functions written in Go that call strings.Index* and co don't escape.

Yes, sorry, it's fine to use go:noescape for functions written in assembly.

IndexByte is already marked with go:noescape:

https://github.com/golang/go/blob/a2f72cc80d3c8cbda793959c3186c76e004532af/src/internal/bytealg/indexbyte_native.go#L9-L10

I suspect this is just a case where the string cast could be cleverer.

I guess we could fix this, but why are you using strings.IndexByte(string(bb), ' ') instead of bytes.IndexByte(bb, ' ')?

IndexByte is already marked with go:noescape

Then the []byte -> string conversion with a memory allocation and a copy for non-escaped argument looks like a bug.

why are you using strings.IndexByte(string(bb), ' ') instead of bytes.IndexByte(bb, ' ')

This is just an artificial example :)
The strings.Index(string(b), stringPrefix) is more real-life. It may be used interchangeably with bytes.Index(b, []byte(stringPrefix)), since there is no a preferred code here.

The non-escaping string(b) still can allocate if it doesn't fit the small tmp buffer.
I think https://github.com/golang/go/issues/27148#issuecomment-422724170 can be relevant here.

bytes.Index(b, []byte(stringPrefix))

This is a preferred way for now, since prefix is usually smaller than haystack itself, and it can probably fit fixes 32-byte stack allocated buffer.

Here is a simple benchmarking code:

package foo

import (
    "bytes"
    "strings"
    "testing"
)

var haystack = bytes.Repeat([]byte("a"), 100)
var needle = "aaa"

func BenchmarkStringsIndex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = strings.Index(string(haystack), needle)
    }
}

func BenchmarkBytesIndex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = bytes.Index(haystack, []byte(needle))
    }
}

The results:

BenchmarkStringsIndex-8 20000000  73.3 ns/op 112 B/op 1 allocs/op
BenchmarkBytesIndex-8   100000000 15.1 ns/op   0 B/op 0 allocs/op

I've written a simple check that detects such calls that can be replaced with bytes.Index to aid in optimization sessions of big projects (https://go-critic.github.io/overview#indexAlloc-ref).

Change https://golang.org/cl/146018 mentions this issue: strings: declare Index as noescape

The CL I just mailed fixes strings.Index. I think all the other methods mentioned in the issue title are already ok. Let me know if that doesn't mesh with your understanding.

Was this page helpful?
0 / 5 - 0 ratings