Telegraf: Create static fields parsing log files

Created on 24 Mar 2017  路  25Comments  路  Source: influxdata/telegraf

Feature Request

Proposal:

A way when using the logparser to create static fields that are returned whenever a line matches.

Use case: [Why is this important (helps with prioritizing requests)]

In our use case, we poll haproxy for statistics every few seconds. One of the things we alert on is when all the servers in a pool are down. However if there is a blip, where we lose all the servers in a pool, but they come back up a few seconds later, our polling interval might not catch the issue. However in this situation, haproxy logs the following message:

backend foo has no server available!

We would instead like to use the log parser to match this pattern, and emit a point with the field for number of servers in the pool set to 0.

The simplest implementation I can think of would be to allow a pattern definition such as:

HAPROXY_NO_SERVERS backend %{NOTSPACE:pxname:tag} has no server available!%{'0':act:int}

In this, the %{'':x:y} would be special syntax where the value isn't parsed from the line, but instead from between the quotes. The value would then be parsed as an int, and put in the field act.

 

This feels like it might be useful for lots of other use cases, but it also feels a little like an edge case.
However I cannot think of any other way of accomplishing our goal without writing a script that tails the log looking for the pattern, and then sends a point to telegraf. Which would be rather unpleasant.

aretail enhancement

Most helpful comment

TL;DR: I think we should do a general fields property.

I tried to mock up how the TOML would look if we supported templates in the same way Logstash does and I ran into some issues.

I'll duplicate the Logstash example here:

grok {
  add_field => {
    "foo_%{somefield}" => "Hello world, from %{host}"
    "new_field" => "new_static_value"

First problem is naming fields with a templated key, It is valid though but I think it's not very nice:

[[inputs.logparser]]
  [inputs.logparser.grok]
    [[inputs.logparser.grok.add_field]]
      "foo_%{somefield}" = "Hello world, from %{host}"

Another issue is setting types that are not strings, maybe we would do something like %{value:int}, but this acts differently from our normal modifier extension since the type is in the second slot.

[[inputs.logparser]]
  [inputs.logparser.grok]
    [[inputs.logparser.grok.add_field]]
      "foo_%{somefield}" = "Hello world, from %{host}"
      new_field = "%{0:int}"

In general I think just having a table of key/values would be nicer, if somewhat less powerful. It would also fit into Telegraf better and promote best practices:

[[inputs.logparser]]
  [inputs.logparser.grok]
    pattern = "..."
  [[inputs.logparser.fields]]
    new_field = 0

All 25 comments

I'd rather not add special syntax if it can be avoided, do you think it is possible to do this with a processor?

Maybe. The difficulty i think is filtering. Making sure the processor only handles points from that specific logparser. You could add a tag on the input.logparser definition, and then have the processor filter on that tag (and remove the tag so it doesn't go to influx). But this feels a little messy.

What about if we allow setting fields explicitly in the config? The field/value pair would simply be set on all metrics.

That could work. As long as in this specific use case we say this logparser will only ever emit this one point. Which wouldn't be hard as if we want to parse other stuff, we can just define another logparser.

However one thought I just had, what about allowing per-plugin-instance processors?

[[inputs.logparser]]
...
[inputs.logparser.processor]
...

We already have several global properties which apply to all inputs (interval,name_override,tags, etc). So why not add processors?
We could then create a processor that can do things like adding, removing, and transforming fields.
This would be a very flexible solution that could solve a very large number of use cases.

It seems like a good idea but I'm reluctant to ok it now because of some prior discussion and me still being new to the project.

I think some of the downsides would be extra complexity in the configuration format and the possibility of non general purpose processors that can only work with specific inputs.

I think it could be done such that the configuration for a [inputs.foo.processors.bar] would be configured exactly the same way as [[processors.bar]].
I think it'd also be reasonable to say that there is no such thing as an input-specific processor. All processors should be able to be used globally, or within a specific plugin.

But as long as we have some way of accomplishing the goal, preferably without getting external scripts involved, I'm open.

I'm also curious, what do we see as example use cases for global processors? I can't think of much (anything really). All the things I can think of apply to specific inputs.
I would also note that they were added back in 1.1, and still the only one that exists is an example processor that prints everything flowing through it.

The fact that no processors have been developed is telling and I agree with your observations. As you mentioned, most of the functionality that one would want in a processor is already available on a per input basis via tagdrop, fielddrop, tags, etc. I suspect this is a big part of the reason why we have no processors. Can you think of any other actions that these options are not providing outside of adding fields?

It seems that a large amount of the need for better filtering revolves around filtering within a single input, such as within inputs.cloudwatch.metrics or inputs.win_perf_counters.object, and having per input processors won't help us here.

Can you think of any other actions that these options are not providing outside of adding fields?

The most common one I see is the need to parse or type-convert values. For example an input provides a field "5" as a string, and you want an integer 5. Or a parsing example of going from "5m" to 300.

This sort of operation seems very site specific, and might be difficult to configure in TOML. I wonder if it would just be nicer to pipe these through an external user defined process or use Kapacitor.

In the meantime, I suggest trying to add a fields option that works just like tags (except supporting more than just strings).

This sort of operation seems very site specific, and might be difficult to configure in TOML.

Not sure what you mean about "site specific". It's come up with the snmp plugin several times because devices return strings instead of numbers. I've also had it come up personally trying to use the jolokia plugin, and some things coming back as strings.

Perhaps we offer a few canned conversions. Configuration would have the user specify:

  • The type: "SI", "IEC", "time",
  • The unit: "s" to convert a time unit to seconds, "KiB" to convert an IEC unit to kibibytes, etc.

Configuration might look like:

[inputs.foo.field_conversion.bar]
type = "si"
unit = "M"

^ Would convert "1.5Gw" to 1500

It would be more flexible to allow the user to specify regexes, but configuration would be messy (I really do despise TOML), and it wouldn't be able to handle time units, since you can't just add or remove digits from the end.

I wonder if it would just be nicer to pipe these through an external user defined process or use Kapacitor.

I don't like the kapacitor idea as I think that would make the configuration incredibly complex. If the user already has influxdb + kapacitor running, they'll have to place a second kapacitor somewhere. And if they have multiple databases or retention policies, the routing of metrics just gets very difficult to handle.

An external user process might be acceptable, but I would argue that the requirement here is it should be specifiable on a per-input basis. If I have telegraf handling thousands of points per second, and I only want to perform this manipulation on a single point that is emitted only every few seconds, I don't want to funnel thousands of points per second through it.
And while I say it might be acceptable, I would still prefer not having to involve external utilities. Especially when such utilities are almost certainly going to be home-grown, and for these common cases, everyone is going to be home-growing the same utility, and the odds of people doing it wrong (introducing bugs), and getting irritated, increases.

By "site specific" I just mean that it needs to be individually customized for each deployment, which is not a problem if we can describe the task declaratively in TOML.

The config you suggest should work well. It might need to allow control of the output type (float/int/string/bool) too, and perhaps some sort of precision control.

I am nervous about having too much TOML nesting, it gets confusing fast. One idea is that we could have references between plugins, that might make it easier to apply to multiple inputs as well.

Just found this, need to look into it #1984.

@phemmer do you have any thoughts on the linked issue (#1984)

Not really. It would solve my use case. The only question is something @danielnelson already raised, do we want a global fields property which can be used on any input, like the tags property. Or do we want it to be specific to the logparser plugin?

TL;DR: I think we should do a general fields property.

I tried to mock up how the TOML would look if we supported templates in the same way Logstash does and I ran into some issues.

I'll duplicate the Logstash example here:

grok {
  add_field => {
    "foo_%{somefield}" => "Hello world, from %{host}"
    "new_field" => "new_static_value"

First problem is naming fields with a templated key, It is valid though but I think it's not very nice:

[[inputs.logparser]]
  [inputs.logparser.grok]
    [[inputs.logparser.grok.add_field]]
      "foo_%{somefield}" = "Hello world, from %{host}"

Another issue is setting types that are not strings, maybe we would do something like %{value:int}, but this acts differently from our normal modifier extension since the type is in the second slot.

[[inputs.logparser]]
  [inputs.logparser.grok]
    [[inputs.logparser.grok.add_field]]
      "foo_%{somefield}" = "Hello world, from %{host}"
      new_field = "%{0:int}"

In general I think just having a table of key/values would be nicer, if somewhat less powerful. It would also fit into Telegraf better and promote best practices:

[[inputs.logparser]]
  [inputs.logparser.grok]
    pattern = "..."
  [[inputs.logparser.fields]]
    new_field = 0

Just to throw it out there, a downside to doing it on a per-input basis, is that if you're extracting multiple patterns from the same log, and you want to set a different field/value for each pattern which is matched, you have to set up separate inputs. Aside from making the config bigger, this might also make the parsing a lot heavier.

It might be worth investigating how support for filters/tags/fields on a sub input basis would look, since we have questions about this raised regularly.

Similar to this, I'd quite like to change the host tag to a field, I'm largely completely uninterested in the host and its causing very high number of series in my environment.

@mcfedr I recently added the converter processor that can do this, which will be in 1.7 and you can test it in 1.7.0-rc1.

Strictly speaking, you can also use it to create a static field by first adding it as a tag and then converting it to a field, but I'd like a more straightforward fix for this issue.

One issue you may run into with your plan to remove the host tag is that you may need this field to provide series uniqueness. Remember that you can only have one value per field for each series key (measurement+tagset). Otherwise the values will overwrite each other.

Also if you do want to get rid of the host flag, it may just be easier to use tagdrop to remove it.

Just came across this ticket as I have a similar need. This talks about the logparser plugin. I would need the same as the original report, but for the docker_log plugin.
I have a docker container sendings loglines via stdout/stderr, normally accessible via "docker logs " (i.e. no regular file). I would like to convert them to a result_code field. The content of the lines is mostly meaningless, but there is a good one and several bad ones. So would like to be able to specify multiple patters, and if they match, map them to 0 or 1, respectively.

@redm123 Try looking at the regex processor which can create a string field based on a pattern, or possibly the enum processor which can create numeric fields but only against exact matches.

A number result would be preferred.
Hmm.. could work if I can combine the processors. Regex creates a static string, enum maps it to a number. But the enum processor would have to run after regex. Not sure if that's possible. Didn't do much with processors up to now.

And: it seems like a pretty complex and wordy solution to a simple problem 馃槄 I was more thinking about something like the initial poster: pattern -> fixed field/tag value.

You can use the order option to set the processor ordering:

[[processors.regex]]
  order = 1
  # other options
[[processors.enum]]
  # other options
  order = 2
Was this page helpful?
0 / 5 - 0 ratings