In terms of documentation one of our biggest vulnerabilities is that our configuration spec is capable of losing parity with reality. We ought to have this solved by v1. I think the shortest path to success here is to write some ruby that generates a config example (simple, then advanced) for each component into a temp directory, then we run vector validate on each one. This should at least capture missing or unknown fields, typos, etc.
Another idea here (admittedly quite fancy) would be to have a single source of truth that generates both the metadata and the actual Rust code.
I had been thinking about the single source of truth idea as well. A custom derive could possibly generate the metadata but that seems more than a bit of a hack, and definitely not what proc macros are supposed to be about. Alternately, we could add a custom build script (via build.rs?) just for the config structs, but that may end up being more work. What methods were you thinking about?
Yeah, I would love this as well. @Jeffail and I discussed this when he was working on the generate sub-command. To make that work he had to bake in some duplication within Vector. I would _love_ to be able to output a manual of sorts that I could parse via a vector sub-command. This could become the source of truth and make generate could hook into that somehow.
vector config --json
Probably not that best command, but you get what I'm proposing. It would output something like:
Related to #1687
I haven't given too much thought to a specific implementation. I agree a custom derive could be a little hacky, but potentially less so if it just adds a method for outputting a JSON representation? It'd be nice to avoid generating the actual struct definitions since we deal with those so directly all the time.
For contributors I would say anything here that obfuscates building the actual component is a worse experience than what we currently have. In the perfect world I would want to define a config struct for my component and have the spec get generated from that, allowing me to focus on building my shiny new thing before worrying about documentation.
But I think until we get there a "good enough for now" is using scripts to report on two things:
I agree a custom derive could be a little hacky, but potentially less so if it just adds a method for outputting a JSON representation?
Yes, a custom derive that outputs generates internal data that could be output as either a JSON (or TOML) representation that could go into our docs would be at least slightly less hacky.
It'd be nice to avoid generating the actual struct definitions since we deal with those so directly all the time.
Agree with @Jeffail on this point -- anything that obfuscates the build process definitely makes this worse for contributors.
@LucioFranco this is becoming more and more important, especially with things like https://github.com/timberio/vector/pull/1972. I'd like to get this done unless you know of a better way.
Another idea here (admittedly quite fancy) would be to have a single source of truth that generates both the metadata and the actual Rust code.
I was thinking about how we could integrate this into a build.rs or something. I'd really like that.
Could I steal this from @LucioFranco ?
I have spent some time thinking about this issue. I would like to say first of all I am very against us doing any sort of codegen whether that be macros or generating rust code. The big reason I say this with my experience with tower-grpc, prost-build and tonic-build, that maintaining codegen is NOT FUN and very painful. A couple reasons is that rust is strict and will give you many compile errors, doing this with codegen makes this process 100x harder. While maybe writing the first pass of this wouldn't be too bad maintaining it would be extremely painful.
That said, I suggest we take a different approach. I would like to explore the possibility of using (maybe ruby or rust) some program to generate a giant matrix of possible configs that a user could write. Each of these configs gets run against our vector validate command. A couple reasons I think this would be really good 1) it provides a much easier UX for new developers who try to add a new sink but their docs are not correct, 2) it would allow us to ensure that we properly dog feed our validation command. This method would also allow us to avoid having to introduce any sort of codegen into our main binary build pipeline.
I'd love to get some feedback on this idea :)
I understand your concerns with code generation. To add to them, I would also consider the increased complexity in understanding even how the program is built. We already have prost-build in the loop.
So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it) also count as code generation in your view, since our documentation is also structured?
My concern with a test matrix of possible configs is the inevitable combinatorial explosion that represents, particularly once transforms are thrown into the mix (and they have to be in the mix in order to handle schema issues for some sinks).
So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it)
I'd like that, assuming we could just output the config "spec" as a CLI command? Ex:
vector components --options --json
Or something similar.
My concern with a test matrix of possible configs is the inevitable combinatorial explosion that represents
Yeah. Although, as a first step, we could just generate a spec and lint it. That would at least tell us if we have invalid options. For example, our vector.spec.toml does this. Maybe we could just lint that? Or adjust it to remove any duplicate keys which are used to demonstrate the variety of values.
So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it) also count as code generation in your view, since our documentation is also structured?
Rust code expansion via macros having side effects is very much frowned upon. And will not work in the future with all the wasm/wat changes to how we might be able to compile macros much faster in the future.
inevitable combinatorial explosion
We don't need to test combination of components just that each permutation of config items match and validate as expected. This shouldn't grow too large at all and should be fast to verify.
Rust code expansion via macros having side effects is very much frowned upon. And will not work in the future with all the wasm/wat changes to how we might be able to compile macros much faster in the future.
There wouldn't be side effects, at least not the way I'd been thinking about it. We'd just generate methods that can output the appropriate description data.
There wouldn't be side effects, at least not the way I'd been thinking about it. We'd just generate methods that can output the appropriate description data.
I believe outputting a file from a macro is considered a side effect? Unless I'm miss understanding something here.
I'm not talking about outputting a file during macro expansion.
There are a lot of ways we could do this, but here's a super rough example. You'd write code roughly like:
#[derive(VectorConfig)]
struct MyConfig {
host: Url,
}
and it would expand to something like:
impl VectorConfig for MyConfig {
fn fields() -> Vec<Field> {
vec![Field { name: "host", kind: "url" }]
}
}
And then, similar to vector list, we could use inventory to iterate over all configs at runtime and write them out somewhere (or just use that data directly).
Another option I'm very curious about is focusing more on some kind of Field-like DSL for describing these configs and relying less on serde.