Scylla: Reduce buffer-handling boilerplate in mutation reader implementation

Created on 20 Dec 2017  路  6Comments  路  Source: scylladb/scylla

Implementations of flat_mutation_reader::impl have to be aware about the batching that happens during reads. This has led to several readers sharing very similar logic in their implementation of fast_forward_to(const dht::partition_range&), fast_forward_to(position_range) and next_partition(). Such code duplication is hard to maintain and attracts bugs.

A more convenient solution would be to delegate the batching logic to an external helper (CRTP may come in handy). Reader implementations should only be concerned with their specific logic.

This can also be a good opportunity to separate the interfaces of the producer (flat_mutation_reader::impl) and the consumer (flat_mutation_reader) side.

internals

Most helpful comment

They sort of already are: producers (the specific implementations of mutation reader) inherit from flat_mutation_reader::impl, consumers interact with flat_mutation_reader. It is true that this have evolved from a simple pimpl, but there is no reason that flat_mutation_reader has to limit itself to forwarding function calls to the impl.

All 6 comments

Not sure if this is what (or among the things) you had in mind but I think that a batch version of operator()() would be greatly beneficial to existing and future code. There could be several variations, one returning all (already loaded) fragments belonging to the same partition and one that returns the whole buffer. Returns would be made via boost::iterator_range or similar to avoid allocations. Something like this would improve the combined reader greatly as we could avoid calling into subreaders on each fragment.
This would also generally improve reader nesting which is something we do a lot.

Maybe we could even make them non-future and rely on the caller ensuring the buffer is non-empty.

This can also be a good opportunity to separate the interfaces of the producer (flat_mutation_reader::impl) and the consumer (flat_mutation_reader) side.

This wasn't stressed enough in the original post. Currently, the buffer-related interface of flat_mutation_reader (and flat_mutation_reader::impl) is from the producer's perspective. This makes it confusing for consumer's and hard to use.

For example:

  • the producer emits fragments until it reaches the end at which point end-of-stream flag is set, so far so good,
  • if at any point the consumer wants to check whether it has reached end-of-stream it needs to check that both the buffer is empty and the flag is set (because the flag only signals that producer has reached EoS).

This results in an interface in which calling reader.is_end_of_stream() is not the correct way for a consumer to check whether the end of stream was reached.

Long story short, looks like this issue should be generalised to "encapsulate buffer-handling logic both from the consumer and the producer side". Such middle layer should allow reducing the boilerplate and provide intuitive interfaces for producers and consumers.

Perhaps the producer and consumer want to be separate objects. We tried this with seastar::subscription/seastar::stream and it didn't set the world on fire, but we know more now.

They sort of already are: producers (the specific implementations of mutation reader) inherit from flat_mutation_reader::impl, consumers interact with flat_mutation_reader. It is true that this have evolved from a simple pimpl, but there is no reason that flat_mutation_reader has to limit itself to forwarding function calls to the impl.

3067 is the issue that I've failed to find before extending the scope of this one.

Was this page helpful?
0 / 5 - 0 ratings