Vector: Improve on-disk buffer benchmarks and collect more data

Created on 22 Dec 2019 · 3Comments · Source: timberio/vector

As part of our overall project to improve on-disk buffering, we need to collect data to inform our decisions. I'd like to improve our benchmarks to collect more performance data on various on-disk buffering scenarios. For example, we should have a benchmark the reproduces #1179.

The end goal should help us answer questions like:

Is leveldb properly configured and tuned? Can we make small tuning adjustments to achieve adequate performance?
Is leveldb failing certain scenarios that would warrant an alternative solution? Can we can prove this with a test/data?
If yes to 2, what are some of the alternatives, and how do they perform relative to leveldb? Prove this with a test/data.

buffers task

Source

binarylogic

👍1

Most helpful comment

Current implementation

Currently, vector has two types of buffers available to users. One, is an in memory buffer that holds up to a fixed amount of events (default is 500). Second, is a disk based buffer which is backed by leveldb which is a embedded key-value database. Implementation wise both of the buffers basically act like a multi-producer single-consumer channel. They each provide a reader and writer end, where the reader is clonable. The memory buffer in this case is just a futures::sync::mpsc channel, while the disk backed buffer is a custom implementation that can be found in src/buffers/disk.rs.

The performance of the memory buffer is about 80% better than the performance of the disk buffer. This is mainly due to the implementation of the disk buffer. The current implementation of the disk buffer needs to encode the event, write the event to the database, then trigger the reader task to read the next event, once the event has been read from the database, we then decode the event and send it to the sink. This process is quite expensive and thus shows why the performance of the disk buffer is poor.

Another thing, that we have not very clearly defined in our docs is what our disk buffer durability guarantees are. Currently, we buffer at least 100 events in memory that are already encoded. Once, we have added the 100th event we will then write these events to leveldb. This in turn actually only writes this batch to the operating system’s memory and then will asynchronously write them to durable disk storage. Because of this, we have no guarantees around when data will be persistent to a machine crash. That said, we are still persistent to process crashes and panics.

In the end, I don’t really feel like its vector’s job to be a durable store for log data. Instead, this job should be offloaded to something like kafka or s3.

Our current implementation of the disk buffer is decently simple and easy to maintain. Through my testing of the benchmarks and tests, I’ve noticed that its is actually quite stable in its current state and works as expected across all three of our supported operating systems. Both of the issues that lead us to think about needing to fix the disk buffer turned out not to be directly related to our use of leveldb. This leads me to suggest that we don’t replace our current disk buffer implementation. I don’t think we have seen many users (maybe I’m the only one? 🙂 ) complain about its performance. Therefore it doesn’t make much sense right now to invest heavily into fixing the performance.

Possible Solutions

Even though we may not want to change our disk buffer right now, it doesn’t mean that we may want to in the future. One idea might be to offload our disk buffer work to a background task. The idea here would be to create a variant of the in memory buffer that will use the disk buffer as extra storage space for events instead of applying back pressure up the topology path. The goal is to optimize for the happy path where we still write to disk but in the critical path of sending events only need to do the same work as the memory buffer. Since, we don’t actually provide any durability guarantees writing to disk asynchronously in a background task will not change how we present the disk buffer to users. This means, that we can totally remove encoding, writing to the database, switching tasks, and decoding from the critical poll path of our sinks. That said, this implementation comes with much more complexity and would need to be heavily tested. This additional complexity may not be worth it.

Other possible solutions would be to adopt a async implementation of https://github.com/postmates/hopper which has a slightly different design than the one mentioned above. Instead, hopper will start to fill the disk once the in memory buffer is full. This means that it can’t handle storing all events across a restart like vector currently is able too. This implementation for now seems like it would decrease the benefits of vector and could provide some inconsistent performance since it is a hybrid approach.

I am going to close this issue as all points have been resolved.

LucioFranco on 7 Jan 2020

👍2

All 3 comments

Current implementation

In the end, I don’t really feel like its vector’s job to be a durable store for log data. Instead, this job should be offloaded to something like kafka or s3.

Possible Solutions

I am going to close this issue as all points have been resolved.

LucioFranco on 7 Jan 2020

👍2

Thanks for writing this up! Just one small thing I noticed:

Currently, we buffer at least 100 events in memory that are already encoded. Once, we have added the 100th event we will then write these events to leveldb.

This is actually the maximum amount we'll batch into a single write, and batches will be written almost immediately if our input isn't saturated. If the input is saturated, we'll write batches of 100.