Currently Vector has robust JSON support, and also robust field flattening. However the way those interact across Vector can be a bit disorienting!
While the examples below highlight add_fields this problem is consistent across Vector.
Our docs do exist on this topic, however they are not exhaustive. It does not describe escaping or some edge cases.
https://vector.dev/docs/about/data-model/log/#dot-notation

Our docs do describe how to access fields using escaping on https://vector.dev/docs/reference/field-path-notation/#escaping

Perhaps a good first step is linking these pages more clearly.
.sImagine an incoming JSON log:
{
"a": { "b": { "c": 0 } },
"a.b": { "c": 0 },
"a.b.c": 0,
}
Here you can see for add_fields inserting "a.b" is the same as inserting a.b:
Imagine an incoming JSON log:
{
"a": [{
"b": [[{
"c": 0
}]],
}],
"a[0]": {
"b": [[{
"c": 0
}]],
},
"a[0].b": [[{
"c": 0
}]],
"a[0].b[0]": [{
"c": 0
}],
"a[0].b[0][0]": {
"c": 0
},
"a[0].b[0][0].c": 0
}
Here you can see that add_fields doesn't make much of a distinction either.
The resolution of this issue requires a specification. We should come up with a good way to let users express and work with these fields.
Particular care should be paid to how these changes reflect in our behavior tests and in both casual/normal use as well as those cases detailed above.
Possible areas to explore::
@binarylogic @lukesteensen Can you help spec this? I know you have opinions.
Docs for Benthos, which uses an adapted version of JSON pointers: https://www.benthos.dev/docs/configuration/field_paths
JSON pointers are great because they're explicit, unobtrusive and also easy to parse. The problem with it is that it's different to what users will expect. For the doc:
{
"foo": {
"bar": "baz"
}
}
999999.9% of users will reach for foo.bar as it's consistent with JS and most other tools. In JSON pointers this is /foo/bar. In Benthos I made two changes in order to support foo.bar:
., this is harmless to the spec as it's an arbitrary character..foo.bar becomes foo.bar.The key reason for the prefix is that it makes it possible to query:
{
"": "bar"
}
Which in the case of Benthos I was more than content with not supporting.
Changing to this spec would be backwards compatible for the vast majority of existing configs. We would need clear documentation regarding the escape sequences ~0 and ~1 as they're not common and most users will have never seen/used it.
One major problem with JSON pointers is that it's primarily aimed at querying data, and therefore indexes are implicit (just a number), and therefore the array must already exist in the object as a reference point.
When mutating data it's therefore possible to express a desire to:
foo.bar.1 = "baz") foo.bar.- = "baz")But it is NOT possible to express with a path that you wish to create a new array containing an element, in that case an object would be created with a key matching the specified index.
For the purposes of our transforms I think that's acceptable, and in a case where arrays need to be constructed we can support that in other (more appropriate) ways.
@Hoverbear
It does not describe escaping or some edge cases.
The docs describe escaping here: https://vector.dev/docs/reference/field-path-notation/#escaping.
@a-rodin Nice! I Missed that, I've added that above.
I only have a few small opinions here and strongly defer to those with more experience building these things (e.g. @Jeffail and @a-rodin).
That said, my opinions are roughly as follows:
foo.bar-style paths. Making these easy to understand and unambiguous with respect to nesting should be the priority.jq, since it seems to best match our use case (i.e. both querying and construction/assignment) but I worry it might be more complex than we need. I would also be perfectly happy with the adapted version of JSON pointers used by Benthos.I agree with @lukesteensen. I'd like to add a few points as well:
I do not want to conflate the TOML syntax with the data structure we expect users to provide. To clarify, it is possible we'll support YAML in the future. Would we want users to supply nested YAML structures to insert fields? I'm indifferent, but thinking about this through YAML lens helps to separate the concerns.
How do hybrid structures work? What if a user did this?
[transform.add_fields]
type = "add_fields"
fields.ec2."container.id" = "abcd1234"
At first glance, I would expect the quoting means that the key is literal, but it is not.
As @lukesteensen said, 99.9% of cases will be simple foo.bar style paths. We should make this simple and hard to mess up.
From a consistency and simplicity standpoint, I'm leaning towards requiring quotes and not allowing the nested TOML syntax. I'm curious what others thing about my example in point 2.