Go: cmd/compile: read-only escape analysis and avoiding string <-> []byte copies

Created on 30 Aug 2011  ·  25Comments  ·  Source: golang/go

Many functions take a []byte but only read from it.

If the escape analysis code could flag parameters of a function as read-only, then code
which passes in a []byte(string) conversion could be cheaper.

bytes.IndexByte is one example.

For example, http/sniff.go does:

// -------------------
  // Index of the first non-whitespace byte in data.                                                                                               
  firstNonWS := 0
  for ; firstNonWS < len(data) && isWS(data[firstNonWS]); firstNonWS++ {
  }

func isWS(b byte) bool {
        return bytes.IndexByte([]byte("\t\n\x0C\n "), b) != -1
}

// -------------------

But it's faster to avoid re-creating the []byte:

func BenchmarkIndexByteLiteral(b *testing.B) {
        for i := 0; i < b.N; i++ {
        IndexByte([]byte("\t\n\x0C\n "), 'x')
        }
}

func BenchmarkIndexByteVariable(b *testing.B) {
        var whitespace = []byte("\t\n\x0C\n ")
        for i := 0; i < b.N; i++ {
                IndexByte(whitespace, 'x')
        }
}

bytes_test.BenchmarkIndexByteLiteral    20000000           125 ns/op
bytes_test.BenchmarkIndexByteVariable   100000000           25.4 ns/op

Related is issue #2204.
GarbageCollector NeedsFix Performance

Most helpful comment

What I think might be useful:

  • For every function/method that has a parameter of type []byte, record whether the slice is modified.

    • Record this in the export data so that the information is correct interprocedurally.

  • When calling a function with an argument of the form []byte(s) for some s of type string, if the function does not modify the []byte, don't copy the string.
  • For an interface with a method M that has a parameter of []byte, add an additional hidden method MString which has a parameter of type string.

    • Perhaps only for methods with a single []byte parameter.

    • Perhaps wait until we have profile guided optimization and use that to see whether this is useful.

  • When converting a type to this interface, if the type's M method does not modify the []byte, have MString forward to M.
  • If the type's M method does modify the []byte, have MString copy to []byte and call M.
  • Given an value of the interface type, when calling M with []byte(s), call MString instead.

All 25 comments

Comment 1 by [email protected]:

It seem to be much easier to add the possibility of zerocopish taking string slices from
byte slices/arrays.
As far I understand, string slices are actually readonly and cap()'less byte slices:
http://research.swtch.com/2009/11/go-data-structures.html

Comment 2:

zerocopish?

Comment 3:

Can you be more specific?
Examples of programs that you think should be
handled specially would be a good way to do that.

Comment 4:

_Owner changed to @rsc._

_Status changed to Accepted._

Comment 5:

_Labels changed: added compilerbug, performance._

Comment 6:

_Labels changed: added priority-later._

Comment 8:

_Labels changed: added go1.1maybe._

Comment 9:

I run into this often, but I should start listing examples.
In goprotobuf, text.go calls writeString with both a string and a []byte converted to a
string:
// writeAny writes an arbitrary field.
func writeAny(w *textWriter, v reflect.Value, props *Properties) {
        v = reflect.Indirect(v)
        // We don't attempt to serialise every possible value type; only those
        // that can occur in protocol buffers, plus a few extra that were easy.
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, string(v.Interface().([]byte)))
        case reflect.String:
                writeString(w, v.String())
Note that the function writeString reallly just wants a read-only slice of bytes:
func writeString(w *textWriter, s string) {
        w.WriteByte('"')
        // Loop over the bytes, not the runes.
        for i := 0; i < len(s); i++ {
                // Divergence from C++: we don't escape apostrophes.
                // There's no need to escape them, and the C++ parser
                // copes with a naked apostrophe.
                switch c := s[i]; c {
                case '\n':
                        w.Write([]byte{'\\', 'n'})
                case '\r':
                        w.Write([]byte{'\\', 'r'})
                case '\t':
                        w.Write([]byte{'\\', 't'})
                case '"':
                        w.Write([]byte{'\\', '"'})
                case '\\':
                        w.Write([]byte{'\\', '\\'})
                default:
                        if isprint(c) {
                                w.WriteByte(c)
                        } else {
                                fmt.Fprintf(w, "\\%03o", c)
                        }
                }
        }
        w.WriteByte('"')
}
It doesn't matter that it's frozen (like a string), nor writable (like a []byte).  But
Go lacks that type, so if instead it'd be nice to write writeAny with a []byte parameter
and invert the switch above to be like:
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, v.Interface().([]byte))
        case reflect.String:
                writeString(w, []byte(v.String())) // no copy!
Where the []byte(v.String()) just makes a slice header pointing in to the string's
memory, since the compiler can verify that writeAny never mutates its slice.

Comment 10:

See patch and CL description at http://golang.org/cl/6850067 for the opposite
but very related case: strconv.ParseUint, ParseBool, etc take a string but calling code
has a []byte.

Comment 11:

_Labels changed: removed go1.1maybe._

Comment 12:

[The time for maybe has passed.]

Comment 13:

People who like this bug also like issue #3512 (cmd/gc: optimized map[string] lookup from
[]byte key)

Comment 14:

FWIW
var m map[string]int
var b []byte
_ = m[string(b)]
case must be radically simpler to implement than general read-only analysis.
It's peephole optimization, when the compiler sees such code it can generate hacky
string object using the []byte pointer.
Another example would be len(string(b)), but this seems useless.

Comment 15:

Yes. That's why they're separate bugs.
I want issue #3512 first, because it's easy.

Comment 16:

_Labels changed: added garbage._

Comment 18:

_Labels changed: added priority-someday, removed priority-later._

Comment 19:

_Labels changed: added repo-main._

Comment 20:

Adding Release=None to all Priority=Someday bugs.

_Labels changed: added release-none._

Change https://golang.org/cl/146018 mentions this issue: strings: declare Index as noescape

I run into this often, but I should start listing examples.

Good idea. From #29802/#29810: encoding/hex.Decode.

This comes up with hash functions (example: sha256.Sum256([]byte(s))).

For github.com/cespare/xxhash in addition to Sum64 I added Sum64String which does the unsafe string-to-slice conversion:

https://github.com/cespare/xxhash/blob/3767db7a7e183c0ad3395680d460d0b61534ae7b/xxhash_unsafe.go#L28-L35

but I'd rather the caller just be able to use Sum64([]byte(s)) and have the compiler optimize it.

If read-only byte slice is supported, we can set the cap of read-only byte slices as a negative value to indicate the underlying bytes are immutable. []byte(aString) results a read-only byte slice without duplicating the underlying bytes of aString. Convertingthe result byte slice, if its cap is a negative value, back again to a string also needs not to duplicate the underlying bytes.

Without the read-only slice feature, is it possible to lazy duplicate the underlying bytes for coversion []byte(aString) on demand?

What I think might be useful:

  • For every function/method that has a parameter of type []byte, record whether the slice is modified.

    • Record this in the export data so that the information is correct interprocedurally.

  • When calling a function with an argument of the form []byte(s) for some s of type string, if the function does not modify the []byte, don't copy the string.
  • For an interface with a method M that has a parameter of []byte, add an additional hidden method MString which has a parameter of type string.

    • Perhaps only for methods with a single []byte parameter.

    • Perhaps wait until we have profile guided optimization and use that to see whether this is useful.

  • When converting a type to this interface, if the type's M method does not modify the []byte, have MString forward to M.
  • If the type's M method does modify the []byte, have MString copy to []byte and call M.
  • Given an value of the interface type, when calling M with []byte(s), call MString instead.

For every function/method that has a parameter of type []byte, record whether the slice is modified.

This might be hard to implement overall, mainly caused by the cases in which the []byte parameter is passed to an interface method within the function body. We could think such functions will modify the slice parameters, but this will discount the usefulness of the idea much.

When converting a type to this interface, if the type's M method does not modify the []byte, have MString forward to M.

Many io.Writer implementations wrap one or more other io.Writer implementations invoked somewhere in the method body. How would this forwarding logic work for those implementations?

Was this page helpful?
0 / 5 - 0 ratings