Vector: Allow for tag mapping in the `metric_to_log` transform

Created on 16 Oct 2020  ยท  9Comments  ยท  Source: timberio/vector

The metric_to_log transform has a limited host_tag option that is meant to carry over the metric hostname tag to the log hostname field. We would do better to allow for mapping of tag data to logs in this transform:

[transforms.metric_to_log]
type = "metric_to_log"
remap = """
.hostname = $tags.hostname
.another_field = $tags.another_tag
"""
processing should enhancement feature

Most helpful comment

@drunkirishcoder

I think a couple steps may not be necessary? (could be wrong).

You are correct. Most of that RFC is for handling events that are currently a Metric. The only relevant part for this issue is the list of fields that we will likely expose via an additional variable to the Remap.

All 9 comments

How should this be implemented? Is it possible to make remap transform a sub-component of this transform? is there an example of something like this?

It shouldn't require the entire transform. Remap is intended to be used throughout Vector. @JeanMertz @FungusHumungus could you link to some examples around using the API? I know we have precedence for this in the reduce transform.

Hey @drunkirishcoder, here are a few pointers for when you're sober again ๐Ÿ˜„:

The Remap language lives in its own crate in this repository, at ./lib/remap-lang.

We haven't released Remap yet, and are still working on the documentation. The next version (0.12, not 0.11, which ships this week) of Vector will officially support Remap, but until then, I'm happy to give you any pointers you need, either here, or on our Discord server.

As for this specific feature.

What we'd need to do to support this is:

  1. Update the metric_to_log transform to add a new configuration field remap.

    You can take a look at the existing remap transform, which does the same thing, but for the source field.

    You can see how we create a new Remap program when the transform is built on boot: https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/src/transforms/remap.rs#L28-L30

    https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/src/transforms/remap.rs#L51-L66

  2. Then, at runtime, run the program similar to how it works in the remap transform:

    https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/src/transforms/remap.rs#L72-L74

  3. Once this is in place, we need to inject a $metric variable into the program to allow accessing metric details within it.

    To achieve this, instead of creating a default runtime using remap::Runtime::default(), we want to inject our pre-created state object:

    https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/lib/remap-lang/src/runtime.rs#L9-L11

    This allows us to inject the metric variable, containing all relevant metric details:

    https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/lib/remap-lang/src/state.rs#L4-L17

  4. Finally, you need to create a map of all the metric values to assign to the variable. The Remap metrics RFC by @FungusHumungus contains a list of fields we should expose:

    • $metric.name
    • $metric.namespace
    • $metric.timestamp
    • $metric.kind
    • $metric.tags

Once all of this is hooked up, we're missing one final piece of the puzzle, which is to actually support accessing nested map fields in variables. I'm actively working on this, and I plan to have this resolved by the end of this week.

There's one more example of using the Remap language outside the remap transform, and that's using remap in a condition:

https://github.com/timberio/vector/blob/c357da46928b23db5016d6b5da17b44c49fb73e3/src/conditions/remap.rs

(PR #4743)

Hey @drunkirishcoder, here are a few pointers for when you're sober again ๐Ÿ˜„:

What do you mean? that's when I do my best work. ๐Ÿ˜„

On a serious note, thank you for the detailed walkthrough. I read through the linked RFC. I think a couple steps may not be necessary? (could be wrong). Currently this transform does the metric event to log event conversion first. Seems to me that this wouldn't need the native metric event type support in remap. The remap would be applied post conversion on the resulting log event, makes it exactly how the standalone remap transform works, right?

@drunkirishcoder

I think a couple steps may not be necessary? (could be wrong).

You are correct. Most of that RFC is for handling events that are currently a Metric. The only relevant part for this issue is the list of fields that we will likely expose via an additional variable to the Remap.

Would it need the additional variable? All the fields in the metric event, tags included, are copied over to the log event during conversion. Would I be able to do the following?

.hostname = .tags.host
.container.image.name = .tags.image_name
del(tags)

or is the proposal to not copy the tags over to the log event, but use remap instead?

That's a great point @drunkirishcoder, and one I had missed.

Given this, I'm more inclined to _remove_ (read: deprecate) the host_tag option, and have that be the only change we make to this transform.

One can then use the regular remap transform to manually move data around as needed:

[transforms.metric_to_log]
    type = "metric_to_log"

[transforms.remap]
    type = "remap"
    inputs = ["metric_to_log"]
    source = """
        .hostname = .tags.host
        .container.image.name = .tags.image_name
        del(.tags)
    """

Another alternative we've been contemplating recently is to allow applying a remap transform to _any_ component, by providing (as a preliminary example) a universal remap.before or remap.after field.

I consider this alternative outside the scope of this issue though, as it'll require an RFC before we land on the final design.

Also, given the above, there doesn't seem to be a particular urgent need to solve this issue, as what is proposed here can already be done given the existing transforms (on nightly builds, for now).

It could be that โ€” as you said โ€” the intent of this issue is to actually _not_ copy over any fields implicitly, but I don't see any compelling use-case for this, as this transform creates a _new_ log event, so there is no risk of overriding existing log fields by doing so. The one reason I can think we'd want this, is if you only want to copy over a subset of metric fields to the new log, but again, that can be solved through a regular remap transform following this one, or through a universal component remap option.


I think it would be best for @binarylogic to give some more details on what he had in mind for this issue specifically, before we continue with the implementation details.

@JeanMertz that sounds good to me. I saw the host_tag option and thought remap was a better fit. Zooming out though, what you've proposed is better.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

LucioFranco picture LucioFranco  ยท  3Comments

a-rodin picture a-rodin  ยท  3Comments

jamtur01 picture jamtur01  ยท  3Comments

jhgg picture jhgg  ยท  4Comments

binarylogic picture binarylogic  ยท  3Comments