Go: proposal: strings: a string factory to convert `bytes` to `string`

Created on 13 Jun 2019  路  8Comments  路  Source: golang/go

Problem:

Each time converting a bytes to string involves a memory allocation, caused by the design immutable string and mutable bytes.

Investigation:

Logically, a constant string which embedded into binaries got limited usage. And the definition of string-processing functions normally got a type string as its argument type, rather than a bytes. The converting has widely occurred, and was driven by the widely usage of strings generated at program runtime.

For example, the benchmark at this Issue from other repository shows that json.Unmarshal are generating large number of memory fragments. Even the 3rd-part library didn't work well. Considering Golang is widely as a backend language, and those kind of calling has been put on the hot path.

Solution:

Simply add a string factory to package strings, exported one function New([]byte) string which converts bytes to string with the memory allocation optimised, and a concurrent-less struct Factory for the internal usage of Goroutines.

FrozenDueToAge Proposal Proposal-FinalCommentPeriod

Most helpful comment

As bytes are mutable there needs to be at least one allocation (unless it memorizes and there are multiple identical strings/substrings) or pools memory for a general function to produce a string from bytes. There already is https://golang.org/pkg/strings/#Builder.

How is the new strings factory better than strings.Builder and what are the tradeoffs?

If its avoiding allocations by reserving a large memory pool than a single string can potentially pin a huge amount of memory which is likely not advisable for general usage.

All 8 comments

Change https://golang.org/cl/182057 mentions this issue: strings: build a string factory for safely and malloc limited converting bytes to string

As bytes are mutable there needs to be at least one allocation (unless it memorizes and there are multiple identical strings/substrings) or pools memory for a general function to produce a string from bytes. There already is https://golang.org/pkg/strings/#Builder.

How is the new strings factory better than strings.Builder and what are the tradeoffs?

If its avoiding allocations by reserving a large memory pool than a single string can potentially pin a huge amount of memory which is likely not advisable for general usage.

As bytes are mutable there needs to be at least one allocation (unless it memorizes and there are multiple identical strings/substrings) or pools memory for a general function to produce a string from bytes. There already is https://golang.org/pkg/strings/#Builder.

How is the new strings factory better than strings.Builder and what are the tradeoffs?

Sorry for the wrong work stream, I put the PR #32594 to mean to be the code draft for the further talking.

If its avoiding allocations by reserving a large memory pool than a single string can potentially pin a huge amount of memory which is likely not advisable for general usage.

In #32594 , strings.Factory means to be a goroutine-locally object, please avoid the function strings.New, since I totally agree Keith Randall's consideration in this comment, and still trying to make an answer for that. If I can't fix that concern, I'll remove strings.New in the future before requiring merging.

And for your original question, I just gave out a function func NewFactoryWithPoolSize(size int) *Factory for the performance optimisation of our callers. Did that answer your concern? Since the caller should know which size could be the average size of the fragments on the hot path. And they could put fourfold size to the generator, such like NewFactoryWithPoolSize(4 * avgSize). I'll organize these answer into the description of this function before merging.

I鈥檓 not sure I follow the request, but https://github.com/golang/go/issues/5160 may be relevant. There鈥檚 also a trick that can be done in user code for the right kinds of setups: https://go-review.googlesource.com/c/164961

In the CL, strings.Factory.New is making a copy of the argument; so does string(x).
One possible advantage might be the lack of fragmentation for very small strings, but string(x) takes care of that too.
Another possible advantage is a lack of fragmentation at the end of larger strings.
But now keeping a reference to any one of these strings will keep a reference to the entire allocation,
which is often problematic (it's a memory leak!) and really not something that any standard library routines should do or encourage.

If you really need this one-big-allocation-backing-many-strings, it would be easier just to
allocate a big []byte b, fill it, do s := string(b) (copies b), and then reslice s.
Or use strings.Builder to fill it and then there's no copy during the string(b).

This does not fit well at all in the standard library. I suggest we decline this proposal.

I filed #32779 for the more limited JSON issue.

Suggested declining this on June 25 (https://github.com/golang/go/issues/32593#issuecomment-505616813) and there have been no comments since then. As such, this seems like a likely decline.

Leaving open for a week for final comments.

Marked this last week as likely decline w/ call for last comments (https://github.com/golang/go/issues/32593#issuecomment-525477304).
Declining now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

longzhizhi picture longzhizhi  路  3Comments

gopherbot picture gopherbot  路  3Comments

natefinch picture natefinch  路  3Comments

rakyll picture rakyll  路  3Comments

ajstarks picture ajstarks  路  3Comments