In 2016, #2632 was closed with the explanation that:
I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.
In summary, it's argued that 1) the compiler should address this, and 2) people can vendor and modify the strconv package.
It's been 4 years since the issue was closed (and 9 years since the issue was first reported), and there there hasn't been a compiler optimization that permits calling the Parse functions with a []byte without it creating garbage. That is:
var b []byte = ...
v, err := strconv.ParseXXX(string(b), ...)
always (as of Go1.15) allocates and copies the input []byte.
Forking strconv is an unfavorable workaround because it means that the vendor fails to benefit from future optimizations to strconv (e.g., cl/187957 or cl/260858) and fixes to the implementation (e.g., #29491) and may go against company guidelines that forbid forks. Personally, I haven't seen anyone fork strconv. Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module. However, some use-cases are constrained to avoid using unsafe (e.g., encoding/json) and so continue to suffer from performance losses.
Given the passage of time and the lack of a natural solution to this, I think we should revisit this issue. It seems that we should either:
1) Modify the compiler to detect cases where a string variable does not escape and that it's use involves no synchronization points, so that it functionally performs an allocation-free cast of []byte to string in the example snippet above. If such a compiler optimization existed, we would probably also need to modify strconv to avoid escaping the string input through the error value by copying the input string. Note that this will slow down cases where parsing fails.
2) Accept that a compiler optimization is not happening, in which case we add ParseBoolBytes, ParseComplexBytes, ParseFloatBytes, ParseIntBytes, and ParseUintBytes, which are identical to their counterparts without the Bytes suffix, but operate on a []byte as the input instead of a string. (We can add Bytes equivalents for the QuoteXXX functions as well, but those seem to be rarely with a []byte.)
Obviously option 1 is preferred, but if its not going to happen, but perhaps it's time to do option 2. It's been almost a decade.
\cc @mvdan @rogpeppe @mdlayher
Rather, in most cases I see users cast a
[]byteto astringusingunsafe. This is what we do in the protobuf module.
At my company we have shared functions that do this too. They see a reasonable amount of use in high-performance code.
It's not hard to write the unsafe code, but it's not completely trivial either -- it's easy to overlook the problem of the string being retained unsafely inside the NumError.
Don't forget option (3): wait for a decision on generics, and then provide an updated strconv library with functions that are polymorphic over indexable byte-containers.
If there's a new strconv library that's generic over []byte and string then any new call we add to strconv now can presumably be trivially implemented in terms of that
If something like https://github.com/golang/go/issues/2205#issuecomment-726468940 was implemented, perhaps we could get the compiler to do the work for us after all. That said, I'm not in favor of simply waiting another few years for that to happen.
Similar to option (3): byte view that can represent []byte or string https://github.com/golang/go/issues/5376 (however it it's covered by generics, probably)
If there was a compiler optimisation for this, it would be good if it could be generalised to support string to []byte conversions too. For example, at the moment, there's no way to calculate the hash of a string without allocations, even with devirtualisation, because the call to Write([]byte(s)) will always incur an allocation.
Maybe, it is not a too bad idea to let slices comparable. Comparing slices is just like comparing strings.
@go101 That is #23725 or #38377. This is not the right place to discuss those ideas.
Sorry, I really misunderstood @rogpeppe's last comment. In my case, I often want structs with byte slice fields are hashable and can be used as map keys but I have to convert the slices to strings in other structs to get hashable values.
Most helpful comment
If there was a compiler optimisation for this, it would be good if it could be generalised to support
stringto[]byteconversions too. For example, at the moment, there's no way to calculate the hash of a string without allocations, even with devirtualisation, because the call toWrite([]byte(s))will always incur an allocation.