We maintain quite a few pipelines that require protobuf support. Here is a summary of our current setup.
As you can see, we use Logstash and a custom protobuf plugin at the moment.
We'd be willing to benchmark vector against that, so I wonder if protobuf support is on the roadmap. We might be able to contribute to an implementation if this sounds interesting.
Happy to discuss this further here.
Hey @mre, thanks. It definitely is interesting. I don't see any reason not to do this for sources/sinks that can support it. Is there any reasons you'd want it to be a transform versus making it an option in the various encoding and decoding settings? For example, the encoding option in the kafka sink.
This would be awesome but there are a number of complications we'd need to work through in order to get there.
The biggest problem is that we don't have an easy way to dynamically load the files generated by the protobuf compiler into Vector and use them dynamically at runtime. Logstash gets around this because Ruby isn't ahead-of-time compiled (i.e. it can load new code easily) and its protobuf implementation allows a strong degree of reflection.
There are a couple of approaches I think we could take:
Come up with a plugin system that would allow your protobuf generated code to be dynamically loaded or otherwise integrated into Vector
Roll our own runtime-focused protobuf library that would let us parse the messages based on protobuf definitions read at runtime
@binarylogic
Is there any reasons you'd want it to be a transform versus making it an option in the various encoding and decoding settings?
For now, I'd mostly be interested in adding protobuf support to the Kafka source (because we use it extensively), which doesn't have a decoding setting as far as I can tell. Also, we'd basically have to add the encoding/decoding option to all sinks/sources separately. That's why I was considering a standalone transformation at first.
I do agree that the encoding/decoding setting is the more idiomatic way to go, though and adding the option might not be a big deal. So let's attempt that first.
@lukesteensen , you're right. Thanks for the input.
About 1:
I only know of [inventory] and [dynamic_reload] so far, but those don't seem to fit the bill here. The former is for compile-time inclusion and the latter keeps a watcher running to look for updates. What we want is some one-time auto-discovery on bootup, I think. Does anyone know of another plugin system that could be usable here?
About 2:
From what I can say, [rust-protobuf] might support this. It offers reflection on protobuf types and has a simple loader that can be used at runtime. Maybe we can wrap that inside vector?
The other two big protobuf libs for Rust, prost and quick-protobuf don't seem to support reflection.
Would going the lua route work? https://github.com/starwing/lua-protobuf this way you could link in your own lua protobuf parser and dynamically do it that way?
That would be a transform, right? Could work. 馃
Long term, I wonder how to communicate the separation between transforms and "codecs" (encode/decode).
Also, if enough people need this in the future, it might be better to have out-of-the-box support for protobuf, but if not, then a lua plugin should do.
@mre
I only know of inventory and dynamic_reload so far, but those don't seem to fit the bill here. The former is for compile-time inclusion and the latter keeps a watcher running to look for updates. What we want is some one-time auto-discovery on bootup, I think. Does anyone know of another plugin system that could be usable here?
I haven't come across one that seems like a great fit. We've discussed a few different plugin strategies (e.g. dynamic loading, external process with IPC) but not made much progress beyond some brainstorming.
From what I can say, rust-protobuf might support this. It offers reflection on protobuf types and has a simple loader that can be used at runtime. Maybe we can wrap that inside vector?
That could be what we need! I haven't had a chance to play around with it, but if we can take in .proto files at runtime and parse messages, that'd be perfect.
Long term, I wonder how to communicate the separation between transforms and "codecs" (encode/decode).
Yeah, this is a part of the design I think we're likely to revisit. If I had to guess, I'd say we'll move towards codecs as a first-class entity that can be configured as part of a source/sink. We could potentially have a codec transform that applies them outside a source/sink, but I'm not sure how useful that'd be.
@LucioFranco
Would going the lua route work? starwing/lua-protobuf this way you could link in your own lua protobuf parser and dynamically do it that way?
This is a really interesting idea. Any dynamic solution, even implemented in Rust, will likely be giving up performance for that flexibility, and binding to a dynamic language where that use case is the norm is a clever way around the limitations of the Rust libraries.
Overall, I'd say that if rust-protobuf can do everything we want, that's likely to be our best bet. If not, we can look into what a Lua-based solution would look like.
@lukesteensen I've assigned this to you. I'd like to discuss spec'ing this out so that we can begin work. After speaking with @mre, this is an important feature for them.
@binarylogic thanks for moving this forward. Will try to build a prototype with Lua to see what the performance looks like. If we're lucky, it might be fast enough (i.e. it won't be the bottleneck).
I also noticed https://docs.rs/serde-protobuf/0.8.1/serde_protobuf/descriptor/index.html which seems like it could accomplish a decode of messages via a .pb file a user provides at runtime. Sadly this crate doesn't include encoding at this time.
I'm trying to grep how rust-protobuf allows this kind of functionality though...
@mre thanks again for reporting this. @Hoverbear did quite a bit of investigation on protobuf support which you can see here. Given the performance limitations of dynamic Protobuf support, we'll be solving this with WASM. I've also opened #2367 which will provide a guide demonstrating how to do this. I'm closing in favor of that.
Most helpful comment
@binarylogic thanks for moving this forward. Will try to build a prototype with Lua to see what the performance looks like. If we're lucky, it might be fast enough (i.e. it won't be the bottleneck).