Problems
Currently managed ledger read entries in a very large batch requests - 100 entries by default. This is an inefficient approach. We should streamline the read requests like what dlog is doing.
I have seen in a production deployment the consumer can never catch up if bookie's avg read latency is 10+ms bad due to the disk and other workloads running on same machine. the problem can potentially be mitigated if we can have a proper readahead mechanism in managed like what dlog is doing.
Should the logic similar to the readAhead logic here in BKLogSegmentEntryReader
@MarvinCai yes. I think it is worth pushing this logic to BK to provide a StreamingReadHandle over the BK ReadHandle. so that the read ahead logic can be reused for both bookkeeper read handle and tiered storage read handle.
/cc @jiazhai @eolivelli in this thread. so they can provide some more thoughts around this if it is worth adding this readahead logic to BK read handle.
@sijie
I agree that pushing that mechanism to the low level API will be useful
This is a also blocking issue for practical use of tiered storage as historical retention, since replay from tiered storage (at least s3) is too slow.
It'd be great if number of read-ahead threads and/or outstanding requests is tunable, as many blob/object store scale throughput proportional to number of connections.
@vicaya thank you for your feedback. @MarvinCai are you willing to give it a try?
@sijie sorry just saw the replies, how about I start with doc with problem statement and try propose a solution. If everything looks good then we can proceed from there.
@MarvinCai Are you working on this issue already?
@sijie I was new to BK code base and was reading the some LeadferHandler and DL codes tog figure out what should be change, have a simple doc about what I think may need to change.
I haven't start writing any code yet. If someone with more experience with BK plan to work on it then I'm fine. Else I'm also glad to help.
Is there any progress on this issue? Whenever people ask me about why we don't use tiered storage, I have to point them to this issue for why it's too slow for us (readers cannot read fast enough from tiered storage and backlogs would build up even at moderate throughput (<1000msgs/s with small messages (<100 bytes), which is <100KB/s).
@nicoloboschi you could be interested in working on a fix for this issue if @MarvinCai doesn't have time to work on this topic
Most helpful comment
@MarvinCai yes. I think it is worth pushing this logic to BK to provide a
StreamingReadHandleover the BKReadHandle. so that the read ahead logic can be reused for both bookkeeper read handle and tiered storage read handle.