Go: bufio: set default buffer size from *bytes.Reader

Created on 31 May 2020  Â·  21Comments  Â·  Source: golang/go

I just came across some code that does this rather natural thing:

r := bufio.NewReader(bytes.NewReader(buf))
//  use r to do bufio-ish things

However, len(buf) is typically low, much smaller than the default buffer size of 4096. I think that bufio.NewReader should check whether r is a *bytes.Reader, and if so, use r.Len() to size the buffer (assuming r.Len() is less than 4096).

Implementation is easy, and would be a good first contribution for someone.

Any objections?

cc @griesemer @bradfitz @ianlancetaylor per golang.org/s/owners

NeedsInvestigation Performance Suggested help wanted

Most helpful comment

Sounds like the consensus is that this isn’t possibly to do now in a way that is simple and uniformly helpful.

All 21 comments

Sounds reasonable. The dependency is already there anyway.

We could in theory do even better though and use the same underlying array so bufio doesn't need to allocate its own of any size.

In theory yes but I worry about Hyrum’s Law and people modifying the buffer underfoot. Maybe we should start there early in cycle and fall back to the less aggressive optimization if needed.

Should strings.Reader add here ?

@qingyunha i don’t think there’s a bufio->strings dependency yet, and I’d be reluctant to add one.

If the optimization is based on r.Len(), would it make sense to check for a Len() int method in general rather than *bytes.Reader in particular? (Or are we worried about Len() int methods with other semantics?)

Or are we worried about Len() int methods with other semantics?

We have been in the past. This same issue came up earlier for http.NewRequest's type switch: https://github.com/golang/go/blob/1519bc4457af7179557a4f04bb35a4e07bedd118/src/net/http/request.go#L874 and we decided just looking for a method wasn't enough.

(But if we wanted, we could use a combination of reflect + interface assertion to get the strings.Reader length without importing strings, and still verify it's only pkg strings and not something else with Len)

The comment from Bradford Lamson-Scribner has been deleted since I started replying, but FWIW, you can obtain the bytes.Reader's underlying buffer like: https://play.golang.org/p/t0o_t_lO3ck

That kinda violates the WriterTo contract at least in spirit, but limited shenanigans are generally permitted within the standard library when sufficiently documented & tested. Better than unsafe at least.

@bradfitz thank you! I saw @josharian post this issue, and was interested in putting together a CL. Not long after posting, I realized it could be a great first contribution for someone (maybe depending on which implementation is preferred) and sort of took my hat out of the ring. I will keep đź‘€ on this though, if no one else gains interest - happy to work on it.

@bradford-hamilton I say go for it. I’d be tempted to do two CLs, first only using Len, then sharing the underlying buffer, so that we can roll back the second one independently if needed.

@bradfitz instead of WriteTo, seems like just Bytes would do it. What am I missing?

@bcmills yeah, I wish we could use a method, and we probably could have long ago. Probably not now, sadly.

@josharian, *bytes.Reader does not have a Bytes() []byte method. (Given that the slice can be extracted using other shenanigans, the lack of that method seems like an oversight, but perhaps that is for a separate issue.)

@bcmills oops, indeed. Thanks.

@josharian sounds good, I'll start with a CL using the bytes.Reader's Len(), and follow it up with another that uses the same underlying byte slice.

Sounds like @bcmills suggestion of a Bytes() []byte method would be another straightforward follow up CL if we decide it makes sense to have it.

Adding new API requires opening a new proposal issue first. Feel free to do that!

Sure, that makes sense to me. I will put some thought into it and maybe put something together after feeling it out a bit in these CLs. I'll keep this conversation posted.

Change https://golang.org/cl/235977 mentions this issue: bufio: update bufio's NewReaderSize default sizing logic

It occurs to me that the existence of Reset methods could make this a major regression for a fairly large class of programs.

Consider what happens if, say, the *bytes.Reader is reset after the bufio.Reader is initialized:
https://play.golang.org/p/NFSbXOU7GAq

Or what happens if the bufio.Reader itself is reset:
https://play.golang.org/p/NFSbXOU7GAq

Programs probably already assume that the buffer size of a *bufio.Reader does not change over time, so if we pick an inappropriate default initially, we can't change it.

—

In contrast, a poor size is trivial to fix on the caller side when appropriate:

    var r *bufio.Reader
    if br, ok := b.(*bytes.Reader); ok {
        r = bufio.NewReaderSize(br, br.Len())
    } else {
        r = bufio.NewReader(b)
    }

@bcmills I’m still pondering.

I wonder whether we should track whether we auto-sizes the buffer originally and be prepared to switch to a large buffer if needed during Reset. Then we potentially have two allocations instead of one, but at most two. That’s for the “use Len” approach.

For the more invasive, “re-use the existing buffer” approach, i think it is clearer what to do in Reset—if we are working with a borrowed buffer, switch to a new borrowed buffer (or allocate a new one if there isn’t one to borrow).

If that’s all too clever, maybe we should just address this with documentation. (And a staticcheck check? cc @dominikh )

I think that's too clever. I'm not even convinced this is worth a staticcheck warning.

r := bufio.NewReader(bytes.NewReader(buf))

This is probably a common pattern inside tests, but how much does the wasted 4k matter in a test? And how often does it happen in non-test code?

Just looking at the Go tree, there are 4 instances of the string bufio.NewReader(bytes.NewReader( and they are all in test files. 23/26 instances of bufio.NewReader(strings.NewReader( are in test files. (And one of the three others is this questionable line.)

FWIW, I came across it in real code by profiling an actual server.

Sounds like the consensus is that this isn’t possibly to do now in a way that is simple and uniformly helpful.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gopherbot picture gopherbot  Â·  3Comments

bradfitz picture bradfitz  Â·  3Comments

myitcv picture myitcv  Â·  3Comments

jayhuang75 picture jayhuang75  Â·  3Comments

OneOfOne picture OneOfOne  Â·  3Comments