Vector: Validate our configuration spec

Created on 6 Feb 2020 · 16Comments · Source: timberio/vector

In terms of documentation one of our biggest vulnerabilities is that our configuration spec is capable of losing parity with reality. We ought to have this solved by v1. I think the shortest path to success here is to write some ruby that generates a config example (simple, then advanced) for each component into a temp directory, then we run vector validate on each one. This should at least capture missing or unknown fields, typos, etc.

ci config external docs tests should good first issue task

Source

Jeffail

👍2

All 16 comments

Another idea here (admittedly quite fancy) would be to have a single source of truth that generates both the metadata and the actual Rust code.

lukesteensen on 7 Feb 2020

👍1

I had been thinking about the single source of truth idea as well. A custom derive could possibly generate the metadata but that seems more than a bit of a hack, and definitely not what proc macros are supposed to be about. Alternately, we could add a custom build script (via build.rs?) just for the config structs, but that may end up being more work. What methods were you thinking about?

bruceg on 7 Feb 2020

Yeah, I would love this as well. @Jeffail and I discussed this when he was working on the generate sub-command. To make that work he had to bake in some duplication within Vector. I would _love_ to be able to output a manual of sorts that I could parse via a vector sub-command. This could become the source of truth and make generate could hook into that somehow.

vector config --json

Probably not that best command, but you get what I'm proposing. It would output something like:

https://github.com/timberio/vector/blob/6f290d3e55d78438100b1bd31747c6c4b1630184/website/metadata.js#L17566-L17590

binarylogic on 7 Feb 2020

Related to #1687

Hoverbear on 8 Feb 2020

I haven't given too much thought to a specific implementation. I agree a custom derive could be a little hacky, but potentially less so if it just adds a method for outputting a JSON representation? It'd be nice to avoid generating the actual struct definitions since we deal with those so directly all the time.

lukesteensen on 9 Feb 2020

For contributors I would say anything here that obfuscates building the actual component is a worse experience than what we currently have. In the perfect world I would want to define a config struct for my component and have the spec get generated from that, allowing me to focus on building my shiny new thing before worrying about documentation.

But I think until we get there a "good enough for now" is using scripts to report on two things:

Whether all components that are implemented have a spec. If not we are able to point the contributor to our specs and the documentation guidelines.
Whether the examples generated from the spec are valid. If not we can print what was generated, the validation errors, and the location of the spec.

Jeffail on 10 Feb 2020

👍1

I agree a custom derive could be a little hacky, but potentially less so if it just adds a method for outputting a JSON representation?

Yes, a custom derive that outputs generates internal data that could be output as either a JSON (or TOML) representation that could go into our docs would be at least slightly less hacky.

It'd be nice to avoid generating the actual struct definitions since we deal with those so directly all the time.

Agree with @Jeffail on this point -- anything that obfuscates the build process definitely makes this worse for contributors.

bruceg on 10 Feb 2020

👍1

@LucioFranco this is becoming more and more important, especially with things like https://github.com/timberio/vector/pull/1972. I'd like to get this done unless you know of a better way.

binarylogic on 3 Mar 2020

👍1

Another idea here (admittedly quite fancy) would be to have a single source of truth that generates both the metadata and the actual Rust code.

I was thinking about how we could integrate this into a build.rs or something. I'd really like that.

Could I steal this from @LucioFranco ?

Hoverbear on 26 Mar 2020

I have spent some time thinking about this issue. I would like to say first of all I am very against us doing any sort of codegen whether that be macros or generating rust code. The big reason I say this with my experience with tower-grpc, prost-build and tonic-build, that maintaining codegen is NOT FUN and very painful. A couple reasons is that rust is strict and will give you many compile errors, doing this with codegen makes this process 100x harder. While maybe writing the first pass of this wouldn't be too bad maintaining it would be extremely painful.

That said, I suggest we take a different approach. I would like to explore the possibility of using (maybe ruby or rust) some program to generate a giant matrix of possible configs that a user could write. Each of these configs gets run against our vector validate command. A couple reasons I think this would be really good 1) it provides a much easier UX for new developers who try to add a new sink but their docs are not correct, 2) it would allow us to ensure that we properly dog feed our validation command. This method would also allow us to avoid having to introduce any sort of codegen into our main binary build pipeline.

I'd love to get some feedback on this idea :)

LucioFranco on 2 Apr 2020

I understand your concerns with code generation. To add to them, I would also consider the increased complexity in understanding even how the program is built. We already have prost-build in the loop.

So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it) also count as code generation in your view, since our documentation is also structured?

My concern with a test matrix of possible configs is the inevitable combinatorial explosion that represents, particularly once transforms are thrown into the mix (and they have to be in the mix in order to handle schema issues for some sinks).

bruceg on 3 Apr 2020

👍1

So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it)

I'd like that, assuming we could just output the config "spec" as a CLI command? Ex:

vector components --options --json

Or something similar.

My concern with a test matrix of possible configs is the inevitable combinatorial explosion that represents

Yeah. Although, as a first step, we could just generate a spec and lint it. That would at least tell us if we have invalid options. For example, our vector.spec.toml does this. Maybe we could just lint that? Or adjust it to remove any duplicate keys which are used to demonstrate the variety of values.

binarylogic on 3 Apr 2020

So we're clear, does the idea of using the Rust code itself, with custom attributes, as the single source of truth (and thus generating parts of the documentation from it) also count as code generation in your view, since our documentation is also structured?

Rust code expansion via macros having side effects is very much frowned upon. And will not work in the future with all the wasm/wat changes to how we might be able to compile macros much faster in the future.

inevitable combinatorial explosion

We don't need to test combination of components just that each permutation of config items match and validate as expected. This shouldn't grow too large at all and should be fast to verify.

LucioFranco on 3 Apr 2020

Rust code expansion via macros having side effects is very much frowned upon. And will not work in the future with all the wasm/wat changes to how we might be able to compile macros much faster in the future.

There wouldn't be side effects, at least not the way I'd been thinking about it. We'd just generate methods that can output the appropriate description data.

lukesteensen on 3 Apr 2020

There wouldn't be side effects, at least not the way I'd been thinking about it. We'd just generate methods that can output the appropriate description data.

I believe outputting a file from a macro is considered a side effect? Unless I'm miss understanding something here.

LucioFranco on 3 Apr 2020

I'm not talking about outputting a file during macro expansion.

There are a lot of ways we could do this, but here's a super rough example. You'd write code roughly like:

#[derive(VectorConfig)]
struct MyConfig {
    host: Url,
}

and it would expand to something like:

impl VectorConfig for MyConfig {
    fn fields() -> Vec<Field> {
        vec![Field { name: "host", kind: "url" }] 
    }
}

And then, similar to vector list, we could use inventory to iterate over all configs at runtime and write them out somewhere (or just use that data directly).

Another option I'm very curious about is focusing more on some kind of Field-like DSL for describing these configs and relying less on serde.

lukesteensen on 3 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings