Many functions take a []byte but only read from it.
If the escape analysis code could flag parameters of a function as read-only, then code
which passes in a []byte(string) conversion could be cheaper.
bytes.IndexByte is one example.
For example, http/sniff.go does:
// -------------------
// Index of the first non-whitespace byte in data.
firstNonWS := 0
for ; firstNonWS < len(data) && isWS(data[firstNonWS]); firstNonWS++ {
}
func isWS(b byte) bool {
return bytes.IndexByte([]byte("\t\n\x0C\n "), b) != -1
}
// -------------------
But it's faster to avoid re-creating the []byte:
func BenchmarkIndexByteLiteral(b *testing.B) {
for i := 0; i < b.N; i++ {
IndexByte([]byte("\t\n\x0C\n "), 'x')
}
}
func BenchmarkIndexByteVariable(b *testing.B) {
var whitespace = []byte("\t\n\x0C\n ")
for i := 0; i < b.N; i++ {
IndexByte(whitespace, 'x')
}
}
bytes_test.BenchmarkIndexByteLiteral 20000000 125 ns/op
bytes_test.BenchmarkIndexByteVariable 100000000 25.4 ns/op
Related is issue #2204.
Comment 1 by [email protected]:
It seem to be much easier to add the possibility of zerocopish taking string slices from byte slices/arrays. As far I understand, string slices are actually readonly and cap()'less byte slices: http://research.swtch.com/2009/11/go-data-structures.html
I run into this often, but I should start listing examples.
In goprotobuf, text.go calls writeString with both a string and a []byte converted to a
string:
// writeAny writes an arbitrary field.
func writeAny(w *textWriter, v reflect.Value, props *Properties) {
v = reflect.Indirect(v)
// We don't attempt to serialise every possible value type; only those
// that can occur in protocol buffers, plus a few extra that were easy.
switch v.Kind() {
case reflect.Slice:
// Should only be a []byte; repeated fields are handled in writeStruct.
writeString(w, string(v.Interface().([]byte)))
case reflect.String:
writeString(w, v.String())
Note that the function writeString reallly just wants a read-only slice of bytes:
func writeString(w *textWriter, s string) {
w.WriteByte('"')
// Loop over the bytes, not the runes.
for i := 0; i < len(s); i++ {
// Divergence from C++: we don't escape apostrophes.
// There's no need to escape them, and the C++ parser
// copes with a naked apostrophe.
switch c := s[i]; c {
case '\n':
w.Write([]byte{'\\', 'n'})
case '\r':
w.Write([]byte{'\\', 'r'})
case '\t':
w.Write([]byte{'\\', 't'})
case '"':
w.Write([]byte{'\\', '"'})
case '\\':
w.Write([]byte{'\\', '\\'})
default:
if isprint(c) {
w.WriteByte(c)
} else {
fmt.Fprintf(w, "\\%03o", c)
}
}
}
w.WriteByte('"')
}
It doesn't matter that it's frozen (like a string), nor writable (like a []byte). But
Go lacks that type, so if instead it'd be nice to write writeAny with a []byte parameter
and invert the switch above to be like:
switch v.Kind() {
case reflect.Slice:
// Should only be a []byte; repeated fields are handled in writeStruct.
writeString(w, v.Interface().([]byte))
case reflect.String:
writeString(w, []byte(v.String())) // no copy!
Where the []byte(v.String()) just makes a slice header pointing in to the string's
memory, since the compiler can verify that writeAny never mutates its slice.
See patch and CL description at http://golang.org/cl/6850067 for the opposite but very related case: strconv.ParseUint, ParseBool, etc take a string but calling code has a []byte.
People who like this bug also like issue #3512 (cmd/gc: optimized map[string] lookup from []byte key)
FWIW var m map[string]int var b []byte _ = m[string(b)] case must be radically simpler to implement than general read-only analysis. It's peephole optimization, when the compiler sees such code it can generate hacky string object using the []byte pointer. Another example would be len(string(b)), but this seems useless.
Yes. That's why they're separate bugs. I want issue #3512 first, because it's easy.
Change https://golang.org/cl/146018 mentions this issue: strings: declare Index as noescape
I run into this often, but I should start listing examples.
Good idea. From #29802/#29810: encoding/hex.Decode.
This comes up with hash functions (example: sha256.Sum256([]byte(s))).
For github.com/cespare/xxhash in addition to Sum64 I added Sum64String which does the unsafe string-to-slice conversion:
https://github.com/cespare/xxhash/blob/3767db7a7e183c0ad3395680d460d0b61534ae7b/xxhash_unsafe.go#L28-L35
but I'd rather the caller just be able to use Sum64([]byte(s)) and have the compiler optimize it.
If read-only byte slice is supported, we can set the cap of read-only byte slices as a negative value to indicate the underlying bytes are immutable. []byte(aString) results a read-only byte slice without duplicating the underlying bytes of aString. Convertingthe result byte slice, if its cap is a negative value, back again to a string also needs not to duplicate the underlying bytes.
Without the read-only slice feature, is it possible to lazy duplicate the underlying bytes for coversion []byte(aString) on demand?
What I think might be useful:
[]byte, record whether the slice is modified.[]byte(s) for some s of type string, if the function does not modify the []byte, don't copy the string.M that has a parameter of []byte, add an additional hidden method MString which has a parameter of type string.[]byte parameter.M method does not modify the []byte, have MString forward to M.M method does modify the []byte, have MString copy to []byte and call M.M with []byte(s), call MString instead.For every function/method that has a parameter of type
[]byte, record whether the slice is modified.
This might be hard to implement overall, mainly caused by the cases in which the []byte parameter is passed to an interface method within the function body. We could think such functions will modify the slice parameters, but this will discount the usefulness of the idea much.
When converting a type to this interface, if the type's
Mmethod does not modify the[]byte, haveMStringforward toM.
Many io.Writer implementations wrap one or more other io.Writer implementations invoked somewhere in the method body. How would this forwarding logic work for those implementations?
Most helpful comment
What I think might be useful:
[]byte, record whether the slice is modified.[]byte(s)for somesof typestring, if the function does not modify the[]byte, don't copy the string.Mthat has a parameter of[]byte, add an additional hidden methodMStringwhich has a parameter of typestring.[]byteparameter.Mmethod does not modify the[]byte, haveMStringforward toM.Mmethod does modify the[]byte, haveMStringcopy to[]byteand callM.Mwith[]byte(s), callMStringinstead.