Vector: Add new metadata for all of Vector's internal component

Created on 3 Oct 2020 · 17Comments · Source: timberio/vector

As part of transitioning to the new metadata system, we'll be submitting PRs for each component. The PR will contain the base definition file, but it is not complete! We'll need humans to finish of the final changes.

New metadata system

Cue

The new metadata files are written in Cue, a new configuration language from Google that is largely inspired by Jsonnet. You can read more about the differences here, but the big difference is the typing and validation. I actually started the transition with Jsonnet but it felt very error-prone, especially since a lot of this can become mind-numbing. It's very easy to forget to add an attribute, make a typo, etc.

Setting aside the strict validation, something I probably don't need to sell to a bunch of Rust developers, the language itself is thoughtful and elegant. The schema/policy/data tree-like organization described here is very powerful and a pattern I tried to adopt here. This is largely enabled by definitions. I would also recommend running through the basics tutorial. You can learn the language very quickly browsing through it.

Validating

Install cue.
Run make check-docs.

What you need to do

[ ] Review the initial base data, change as necessary
[ ] Add one or more output definitions
[ ] Add one or more how_it_works definitions

Review base data, change as necessary

The base data was derived from the current metadata, but we still want to review it for opportunities to improve it.

Add one or more `output` definitions

Logs

Log components output events and events have a schema. Currently, every component outputs a single event, but in the future, we will be adding components that output multiple events. For now, name your event something generic, like "line" for the file source, and define the schema accordingly.

Metrics

Metric components output metric events. Unlike logs, metrics components almost always output multiple events.

Example

Add `how_it_works` sections

Finally, we want to move the "How it works" sections into the definitions.

Each component type automatically includes default sections based on the component type. Do not add these sections. See sources, transforms, and sinks sections.

Example

feature

Source

binarylogic

👍2

Most helpful comment

@kirillt yep, although I stayed away from defaults for most options because I want us to address them. A problem with the last system was exactly that. We'd forget to address options because they weren't required. This makes them discoverable and explicit.

binarylogic on 5 Oct 2020

👍2

All 17 comments

@binarylogic the transforms and sinks links at the bottom don't link to anything yet. I did check transforms (https://github.com/timberio/vector/blob/master/docs/reference/components/transforms.cue), but it doesn't include any default sections yet.

JeanMertz on 5 Oct 2020

@binarylogic am I correct in assuming that the output section is only relevant to sources?

jszwedko on 5 Oct 2020

@JeanMertz will fix now.

@jszwedko it's for both source and transforms. I've updated the issue description to include a transform example.

binarylogic on 5 Oct 2020

We can use default values, for example for supported platforms:

--- a/docs/reference/components.cue
+++ b/docs/reference/components.cue
@@ -150,12 +150,12 @@ components: close({
       // The platforms that this component is available in. It is possible for
       // Vector to disable some components on a per-platform basis.
       platforms: {
-        "aarch64-unknown-linux-gnu": bool
-        "aarch64-unknown-linux-musl": bool
-        "x86_64-apple-darwin": bool
-        "x86_64-pc-windows-msv": bool
-        "x86_64-unknown-linux-gnu": bool
-        "x86_64-unknown-linux-musl": bool
+        "aarch64-unknown-linux-gnu": bool | *true
+        "aarch64-unknown-linux-musl": bool | *true
+        "x86_64-apple-darwin": bool | *true
+        "x86_64-pc-windows-msv": bool | *true
+        "x86_64-unknown-linux-gnu": bool | *true
+        "x86_64-unknown-linux-musl": bool | *true
       }

https://cuelang.org/docs/tutorials/tour/types/defaults/

fanatid on 5 Oct 2020

binarylogic on 5 Oct 2020

👍2

There's no "how it works" component reference for sinks yet? I see sources.cue and transforms.cue but not sinks.cue

jamtur01 on 6 Oct 2020

We probably want to make some things stricter.

For instance, the _.output.logs._.fields.type:

type: {
    "*": {}
    "string"?: {
        examples: [string, ...string]
    }
    "timestamp"?: {
        examples: ["2020-11-01T21:15:47.443232Z"]
    }
}

the way it's defined currently allows field to be any combination of the types:

type: {
    string: {
        examples: ["qwe"]
    }
    timestamp: {}
}

is accepted, but, clearly, it's not intended. This is because our semantics is different - we really want to just have a "one of" the possible values, not any combination of them.

Otoh, if we write the constraint as:

type:
    close({"*": {}}) |
    close({
        "string":
            close({
                examples: [string, ...string]
            })
    }) |
    close({
        timestamp: close({
            examples: ["2020-11-01T21:15:47.443232Z"]
        })
    })

The sample above is rejected as expected!

I think there are other places like this I didn't encounter yet.

This neat trick makes use of two properties of the language: closed structs and disjunctions of structs. It is important that closed structs are used, and a simple disjunction of open structs is not enough - if any of the structs in the disjunction are open, the evaluation can't conclude that a particular type will be the _only one_ to satisfy the requested disjoint, as open struct may be extended to also satisfy a condition. Using closed structs prevents this possibility and allows the evaluation to determine what single struct of the disjunction currently satisfies the value.

I hope this was educational, and not too long :D

Docs:

MOZGIII on 7 Oct 2020

Also, I find this form of command really useful when working on the docs PRs: find docs -name '*.cue' | xargs cue eval --all-errors -e 'components.sources.docker' -c
Might be useful to someone.

MOZGIII on 7 Oct 2020

Ah! Yes, I like that, I'll add.

binarylogic on 7 Oct 2020

@binarylogic The relevant_when field defined in reference/components.cue is defined as a string:

      relevant_when?: string

Is that intentional? The old way would define this field more as a map, eg:

relevant_when = {mode = ["tcp", "udp"]}

FungusHumungus on 9 Oct 2020

Yep, it is. I'm leaving it open-ended since it can be much more complex than that.

binarylogic on 9 Oct 2020

Cool. So is something like:

relevant_when = "mode is tcp or udp"

the way to go?

FungusHumungus on 9 Oct 2020

Exactly, much easier to represent complex conditions. I would do:

relevant_when = "`mode` = `tcp` or `udp`"

binarylogic on 9 Oct 2020

👍1

Hi @binarylogic , can you clarify what encoding options mean? For example, what does json: null mean?

juchiast on 9 Oct 2020

@juchiast it signals if the json encoding option is supported or not.

binarylogic on 9 Oct 2020

@binarylogic Thanks! This section is kinda incorrect,

https://github.com/timberio/vector/blob/master/docs/reference/components/sinks/azure_monitor_logs.cue#L24-L30

azure_monitor_logs only has a variant named default,

https://github.com/timberio/vector/blob/master/src/sinks/azure_monitor_logs.rs#L50-L53

juchiast on 9 Oct 2020

Then it seems like the encoding option isn't relevant, right? So I'd just do:

encoding: enabled: false

binarylogic on 9 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

ECS log schema support

raghu999 · 3Comments

[aws_s3] vector is unable to start if ec2 instance does not have rw to S3 bucket.

kaarolch · 3Comments

Improve on-disk buffer benchmarks and collect more data

binarylogic · 3Comments

Add additional filters to `docker` source

LucioFranco · 3Comments

New `gcp_cloud_storage` sink

trK54Ylmz · 3Comments

Vector: Add new metadata for all of Vector's internal component

New metadata system

Cue

Validating

What you need to do

Review base data, change as necessary

Add one or more output definitions

Logs

Metrics

Add how_it_works sections

Most helpful comment

All 17 comments

Related issues

Add one or more `output` definitions

Add `how_it_works` sections