Vector: Allow other configuration file formats besides toml

Created on 8 Dec 2019  路  16Comments  路  Source: timberio/vector

please allow configuration in other formats besides toml. We avoid using software based on toml-files. Especially for complex configuration, the format is imho terrible.

config should approval requirements enhancement

Most helpful comment

Not major thoughts but:

1) Please don't have support for more than one configuration language - one of the great things about open-source tools is the corpus of example configurations out there. Don't make it hard for folks to share nor make it hard for you to document. :)

2) YAML seems like it's probably the de facto choice for configuration file formats right now, certainly, folks in the K8s et al world are used to working with it and probably have tooling around it.

All 16 comments

@trondhindenes thank you for reporting your experience with Vector. Do you have any specific configuration format in mind?

Issue https://github.com/timberio/vector/issues/1328 proposes adding support for defining the configuration using one of scripting languages to make it more flexible and scalable.

Yes, thank you for letting us know. We'd love to hear any suggestions for alternative formats.

As @a-rodin mentioned, we're exploring writing configs in any scripting languages (#1328), but have a concerns that could be confusing. We need to collect a lot more user feedback. Let us know what you think!

since we're heavy users of Ansible and Kubernetes, yaml is the de-facto dsl for config files internally. However, there's gonna be almost as many opinions as people doing the opinion-ing on these matters, so the best way is probably to support multiple.

Regarding scripting languages: I don't know enough about vector to have an opinion - honestly I stopped reading when I saw that it was toml-based :-(

Got it, appreciate the feedback. And yes, I am certain we'll get similar feedback for yaml 馃槃 . It probably makes sense to support both; I don't foresee this being an issue.

Just to throw another opinion into the mix, and feel free to disregard... I鈥檝e not had any problems with the TOML configuration. I have little TOML experience, but it seems well supported in editors and most languages, well specified, easy to read, easy to understand. Vector doesn鈥檛 even require writing that much configuration. Seems like a good choice, especially given Vector comes out of the Rust ecosystem.

I might be being shortsighted, but I鈥檓 not sure I can see much benefit in Vector using the same config syntax as Ansible or Kubernetes. They鈥檙e different configuration languages/semantics, with very different purposes. What do people think the benefits are here?

I鈥檇 give a +1 to only having one configuration syntax. For those new to the project, documentation purposes, and answers on community discussion forums like Stack Overflow having one syntax is going to be easier to understand. Plus, less work and maintenance for the developers, less complexity in the codebase. Getting to a point where some features are only available in one config syntax would be a bad place to end up too!

Scripting as a more advanced feature, in the same way that Redis and Nginx have would be great, but just different config syntaxes doesn鈥檛 seem worth it to me.

@danpalmer are you saying toml and yaml have very different purposes? I would disagree there.

I do see the the point in documentation overhead with multiple languages tho. That can probably be solved by auto-generating examples in multiple languages, but there'd be some overhead in supporting that anyway.

Didn't want to open a can of worms here, but we'd be very reluctant to consider a using a product where we'd have to do "semi-advanced" configuration using toml.

@trondhindenes I'm not saying that toml and yaml have different purposes, I'm saying that "Ansible yaml" and "Kubernetes yaml" are two different languages* and "Vector yaml" would be different again, so I'm not sure there's much benefit to them all being yaml.

Maybe I'm missing some complexity with toml? It seems comparable to yaml?

*These languages use the same _syntax_ (mostly), but while in Ansible a playbook will be a list of tasks where each task is a unit of work, using some "module", in Kubernetes there's a completely different format of objects, and in Vector it would all be about sources/sinks/transforms which have no meaning to Ansible or Kubernetes. They will look the same, but the syntax of yaml takes minutes to learn whereas the semantics of Ansible/Kubernetes/Vector will take days, weeks, maybe years to learn.

@danpalmer I think we agree mostly.

@danpalmer One of the advantages of having all config files in yaml (or any other common syntax) is that you can use the same tools (e.g. kapitan with jsonne) to templatize all your configuration.

This being said I am not a very strong proponent of yaml configuration files as I know before-hand that they can be horrible.

Speaking of configuration syntax I find that the current TOML syntax with the heavy use of "inputs" list is a bit complicated to parse and read. The graph of transforms is not easy to mentally rebuild by reading the configuration.

These are all excellent points. Our primary concern is what @danpalmer outlined:

For those new to the project, documentation purposes, and answers on community discussion forums like Stack Overflow having one syntax is going to be easier to understand. Plus, less work and maintenance for the developers, less complexity in the codebase.

I'm not saying we won't support YAML, but we want to think carefully before adding features like this to Vector. Hearing functional reasons, like the one below, helps us to better understand the reasoning:

One of the advantages of having all config files in yaml (or any other common syntax) is that you can use the same tools (e.g. kapitan with jsonne) to templatize all your configuration.

We have a project on our roadmap to approach this (scheduled for February). We'll be exploring other config formats, scripting solutions (https://github.com/timberio/vector/issues/1328), and better ways to develop/test configs (https://github.com/timberio/vector/issues/1318). We'll start with a specification and get buy-in from everyone interested.

I don't think here is the right place to talk about this but I am not ready to open an other issue so here we are (I can copy-paste elsewhere if asked).

My biggest pains with the current format are :

  1. the graph is completely implicit and hard to decode as you must read every inputs fields
  2. I don't like the "array of tables" TOML syntax (I don't like list of objects in general and I think we are generally best served with plain objects/tables)
  3. I can't split the configuration in several files

Addressing the first point would be to avoid using inputs field in each nodes and to have a separate configuration to define the pipelines.
Taking as reference the example https://vector.dev/docs/setup/guides/unit-testing/ in a potential new syntax it would become聽:

[pipelines]
  A = ["sources.over_tcp","transforms.foo","transforms.baz","sinks.over_http"]
  B = ["transforms.foo","transforms.bar"]

[sources.over_tcp]
  type = "tcp"
  address = "0.0.0.0:9000"

[transforms.foo]
  type = "grok_parser"
  pattern = "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}"

[transforms.bar]
  type = "add_fields"
  [transforms.bar.fields]
    new_field = "this is a static value"

[transforms.baz]
  type = "remove_fields"
  fields = ["level"]

[sinks.over_http]
  type = "http"
  uri = "http://localhost:4195/post"
  encoding = "text"

This syntax makes it easier to see that "baz" takes its input from "foo" instead of the "bar" we could have expected.

For the third point I think a "includes" key usable e.g. as inputs = ["vec.d/*.toml"] with a defined merging semantic would be great.

For the second I will not elaborate here.

Thanks @joulaud, that's very helpful.

the graph is completely implicit and hard to decode as you must read every inputs fields
Addressing the first point would be to avoid using inputs field in each nodes and to have a separate configuration to define the pipelines.

I really like the pipelines directive you've outlined and opened https://github.com/timberio/vector/issues/1447 to represent that work. I can see how this would make component organization and reuse easier to follow. In addition, the ability to break up the config into multiple files pairs nicely with the new unit testing feature we added.

I can't split the configuration in several files

I've opened https://github.com/timberio/vector/issues/1445 to represent this work.

For the third point I think a "includes" key usable e.g. as inputs = ["vec.d/*.toml"] with a defined merging semantic would be great.

I'm not sure I follow how this would work exactly. It seems like my "Chaining Pipelines" example in https://github.com/timberio/vector/issues/1447 would cover this, correct?


To take a step back, I've opened https://github.com/timberio/vector/milestone/27 to generally improve configuration development. The project is still unshaped and we'll create a project-wide specification before beginning work, which should start at the beginning of February.

Issues we'll most likely do:

1445, #1446, #1447

You did a great job creating those issues. I will follow #1455 and #1447 and comment there if needed. Thanks @binarylogic

Not major thoughts but:

1) Please don't have support for more than one configuration language - one of the great things about open-source tools is the corpus of example configurations out there. Don't make it hard for folks to share nor make it hard for you to document. :)

2) YAML seems like it's probably the de facto choice for configuration file formats right now, certainly, folks in the K8s et al world are used to working with it and probably have tooling around it.

So, the Kubernetes integration has been around for some time now, and one of the consistent points of feedback is that mixing TOML config bits and YAML for Helm config is really odd.
The proposed remediation is switching Vector config file format to YAML - this will allow natural integration into with YAML-based config files world. Thoughts?

Was this page helpful?
0 / 5 - 0 ratings