Vector: New `shape` transform

Created on 13 Aug 2019  路  13Comments  路  Source: timberio/vector

We need a better way to add, remove, rename, and possibly coerce fields in one shot. If simple enough, this could replace the add_fields, remove_fields, and coerce transform.

I'm open to better naming, such as simply transform.

Ref https://github.com/timberio/vector/issues/377

blocked approval requirements feature

Most helpful comment

It would also be cool to add some sort of expand operator to this, so that (once our internal data is structured) we can expand an event into N events by selecting an array field (or multiple). That would allow us to turn an event such as:

{
  "id": "whatever",
  "foo": [1,2]
}

Into two events:

{ "id": "whatever", "foo": 1 }
{ "id": "whatever", "foo": 2 }

All 13 comments

Would be awesome if this new transform support string split, like the example below

from

{
    "args": "a=1&b=2&c"
}

to

{
    "args_a": "1",
    "args_b": "2",
    "args_c": ""

nginx produces this kind of log http://nginx.org/en/docs/http/ngx_http_core_module.html#var_args

My current implementation is using lua, but it's not elegant

[transforms.parse_args]
  type = "lua"
  inputs = ["datasource"]
  source = """
function Split(szFullString, szSeparator)
local nFindStartIndex = 1
local nSplitIndex = 1
local nSplitArray = {}
while true do
   local nFindLastIndex = string.find(szFullString, szSeparator, nFindStartIndex)
   if not nFindLastIndex then
    nSplitArray[nSplitIndex] = string.sub(szFullString, nFindStartIndex, string.len(szFullString))
    break
   end
   nSplitArray[nSplitIndex] = string.sub(szFullString, nFindStartIndex, nFindLastIndex - 1)
   nFindStartIndex = nFindLastIndex + string.len(szSeparator)
   nSplitIndex = nSplitIndex + 1
end
return nSplitArray
end

local args = Split(event["args"], "&")
for i, arg in ipairs(args) do
   local arg_d = Split(arg, "=")
   if arg_d[1] and arg_d[2] then
      local key = string.format("args_%s", arg_d[1], arg_d[2])
      event[key] = arg_d[2]
   elseif arg_d[1] then
      local key = string.format("args_%s", arg_d[1])
      event[key] = ""
   end
end
"""

It would also be cool to add some sort of expand operator to this, so that (once our internal data is structured) we can expand an event into N events by selecting an array field (or multiple). That would allow us to turn an event such as:

{
  "id": "whatever",
  "foo": [1,2]
}

Into two events:

{ "id": "whatever", "foo": 1 }
{ "id": "whatever", "foo": 2 }

I like the "expand" idea. I added the needs: spec label to represent the high-level work around transforming events. We need to think about the experience holistically and determine if all of these transformations are best handled with individual transforms, a single transform, a programmable transform, or all of them 馃槃 .

Just adding a couple of libraries here for inspiration:

To clarify, we probably shouldn't use these libraries, but we can expose a similar interface:

[transforms.change_it_up]
  type = "shape"

  changes = [
    {add.new_field = "my value"},
    {add.parent.child = "this is a nested field value"},
    {coerce.another_field = "int"},
    {rename.old_field = "new_name"},
    {remove.bad_field = true}
  ]

This is a _very_ rough example, but you get the idea. The awkwardness with the above syntax is removing fields, but you get the idea.

Finally, another interesting source of inspiration is the jq library and it's interface:

jq '[.[] | {message: .commit.message, name: .commit.committer.name, parents: [.parents[].html_url]}]'

It's proven to be a powerful succinct syntax that engineers already understand. I'm not proposing that this interface would power the generic shape transform, but it is interesting as an alternative jq transform that exposes the same syntax.

We need a better way to add, remove, rename

Definitely! Also don't forget fields coping.

Parsing query strings for me sounds like a task to a separate transform.

Even with add+remove+copy+rename you need to define some strict rules to not confuse users.
Like field couldn't be renamed foo=>bar and then foo deleted and vice versa, foo copied to bar and then bar added one more time.

Agree! And thanks for the feedback. @Hoverbear is working all of this right now. She's putting together a rough guide on schema management that will summarize the work we do here. Would love to get your thoughts when that is done.

@anton-ryzhov I agree about query strings!

If we do end up combing add, rename etc it'll probably be as a rework or reshape that takes a sequence of steps, so no worry about user confusion around order of operations. :)

@binarylogic Bumblebee looks pretty compelling, it's just a sequence of steps. I worry Proteus feels very custom with that DSL.

@binarylogic That example in https://github.com/timberio/vector/issues/750#issuecomment-582989026 is unfortunately not valid toml. :(

Here's the closest I could come up with that is valid and clean:

[transforms.change_it_up]
  type = "shape"
  inputs = ["my-source-id"]
  [[transforms.change_it_up.changes]]
  add = "new_field"
  value = "my value" # All value fields below allow templating.
  [[transforms.change_it_up.changes]]
  create = "parent.child" # Users can escape with \. for period-containing fields.
  value = "this is a nested value"
  [[transforms.change_it_up.changes]] # Consider leaving this as a separate transform.
  rename = "old_field"
  to = "new_name"
  [[transforms.change_it_up.changes]]
  remove = "bad_field" # Optionally an array

Say what? Tom... Before we decide on this change, we probably need to clarify #1731. Because I'm wondering if a higher-level solution like the pipelines syntax in #1679 or the compose transform proposed in #1653 would solve this better. Your example above is _very_ similar to the compose transform, but less powerful.

We'll make a decision on Monday what to do here.

We've chosen a general approach discussed in #1653, since this would be just a add/remove/rename inside of a compose. :)

This is being closed and please direct your attentions to #1653.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

valyala picture valyala  路  3Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

binarylogic picture binarylogic  路  3Comments

jhgg picture jhgg  路  4Comments