As part of transitioning to the new metadata system, we'll be submitting PRs for each component. The PR will contain the base definition file, but it is not complete! We'll need humans to finish of the final changes.
The new metadata files are written in Cue, a new configuration language from Google that is largely inspired by Jsonnet. You can read more about the differences here, but the big difference is the typing and validation. I actually started the transition with Jsonnet but it felt very error-prone, especially since a lot of this can become mind-numbing. It's very easy to forget to add an attribute, make a typo, etc.
Setting aside the strict validation, something I probably don't need to sell to a bunch of Rust developers, the language itself is thoughtful and elegant. The schema/policy/data tree-like organization described here is very powerful and a pattern I tried to adopt here. This is largely enabled by definitions. I would also recommend running through the basics tutorial. You can learn the language very quickly browsing through it.
make check-docs.output definitionshow_it_works definitionsThe base data was derived from the current metadata, but we still want to review it for opportunities to improve it.
output definitionsLog components output events and events have a schema. Currently, every component outputs a single event, but in the future, we will be adding components that output multiple events. For now, name your event something generic, like "line" for the file source, and define the schema accordingly.
Metric components output metric events. Unlike logs, metrics components almost always output multiple events.
how_it_works sectionsFinally, we want to move the "How it works" sections into the definitions.
Each component type automatically includes default sections based on the component type. Do not add these sections. See sources, transforms, and sinks sections.
@binarylogic the transforms and sinks links at the bottom don't link to anything yet. I did check transforms (https://github.com/timberio/vector/blob/master/docs/reference/components/transforms.cue), but it doesn't include any default sections yet.
@binarylogic am I correct in assuming that the output section is only relevant to sources?
@JeanMertz will fix now.
@jszwedko it's for both source and transforms. I've updated the issue description to include a transform example.
We can use default values, for example for supported platforms:
--- a/docs/reference/components.cue
+++ b/docs/reference/components.cue
@@ -150,12 +150,12 @@ components: close({
// The platforms that this component is available in. It is possible for
// Vector to disable some components on a per-platform basis.
platforms: {
- "aarch64-unknown-linux-gnu": bool
- "aarch64-unknown-linux-musl": bool
- "x86_64-apple-darwin": bool
- "x86_64-pc-windows-msv": bool
- "x86_64-unknown-linux-gnu": bool
- "x86_64-unknown-linux-musl": bool
+ "aarch64-unknown-linux-gnu": bool | *true
+ "aarch64-unknown-linux-musl": bool | *true
+ "x86_64-apple-darwin": bool | *true
+ "x86_64-pc-windows-msv": bool | *true
+ "x86_64-unknown-linux-gnu": bool | *true
+ "x86_64-unknown-linux-musl": bool | *true
}
@kirillt yep, although I stayed away from defaults for most options because I want us to address them. A problem with the last system was exactly that. We'd forget to address options because they weren't required. This makes them discoverable and explicit.
There's no "how it works" component reference for sinks yet? I see sources.cue and transforms.cue but not sinks.cue
We probably want to make some things stricter.
For instance, the _.output.logs._.fields.type:
type: {
"*": {}
"string"?: {
examples: [string, ...string]
}
"timestamp"?: {
examples: ["2020-11-01T21:15:47.443232Z"]
}
}
the way it's defined currently allows field to be any combination of the types:
type: {
string: {
examples: ["qwe"]
}
timestamp: {}
}
is accepted, but, clearly, it's not intended. This is because our semantics is different - we really want to just have a "one of" the possible values, not any combination of them.
Otoh, if we write the constraint as:
type:
close({"*": {}}) |
close({
"string":
close({
examples: [string, ...string]
})
}) |
close({
timestamp: close({
examples: ["2020-11-01T21:15:47.443232Z"]
})
})
The sample above is rejected as expected!
I think there are other places like this I didn't encounter yet.
This neat trick makes use of two properties of the language: closed structs and disjunctions of structs. It is important that closed structs are used, and a simple disjunction of open structs is not enough - if any of the structs in the disjunction are open, the evaluation can't conclude that a particular type will be the _only one_ to satisfy the requested disjoint, as open struct may be extended to also satisfy a condition. Using closed structs prevents this possibility and allows the evaluation to determine what single struct of the disjunction currently satisfies the value.
I hope this was educational, and not too long :D
Docs:
Also, I find this form of command really useful when working on the docs PRs: find docs -name '*.cue' | xargs cue eval --all-errors -e 'components.sources.docker' -c
Might be useful to someone.
Ah! Yes, I like that, I'll add.
@binarylogic The relevant_when field defined in reference/components.cue is defined as a string:
relevant_when?: string
Is that intentional? The old way would define this field more as a map, eg:
relevant_when = {mode = ["tcp", "udp"]}
Yep, it is. I'm leaving it open-ended since it can be much more complex than that.
Cool. So is something like:
relevant_when = "mode is tcp or udp"
the way to go?
Exactly, much easier to represent complex conditions. I would do:
relevant_when = "`mode` = `tcp` or `udp`"
Hi @binarylogic , can you clarify what encoding options mean? For example, what does json: null mean?
@juchiast it signals if the json encoding option is supported or not.
@binarylogic Thanks! This section is kinda incorrect,
azure_monitor_logs only has a variant named default,
https://github.com/timberio/vector/blob/master/src/sinks/azure_monitor_logs.rs#L50-L53
Then it seems like the encoding option isn't relevant, right? So I'd just do:
encoding: enabled: false
Most helpful comment
@kirillt yep, although I stayed away from defaults for most options because I want us to address them. A problem with the last system was exactly that. We'd forget to address options because they weren't required. This makes them discoverable and explicit.