Telegraf: Graphite templates with a mix of filters and wildcards don't map as expected

Created on 10 Jan 2017  路  6Comments  路  Source: influxdata/telegraf

Bug report

When adding templates, we're seeing that if we specifically define filters in combination with matching number of fields, the portion to match based on multiple fields does not apply. I'm following the current docs here. Specifically, we are trying to transform Grafana's built-in metrics from graphite to Influx Line Protocol using Telegraf.

Relevant telegraf.conf:

[inputs.tcp_listener]
allowed_pending_messages = 10000
data_format = "graphite"
max_tcp_connections = 250
separator = "_"
service_address = ":8094"

(see templates detail below)

System info:

Telegraf v1.1.1 (git: release-1.1.0 94de9dca1fc6efb3a4bf3ec6869c356278c6755a) running on CentOS Linux release 7.2.1511 (Core)

Steps to reproduce:

Our data starts like this:

grafana.api.dashboard.save.count
grafana.api.dashboard.save.max
grafana.page.resp_status.code_200.count
grafana.proxy.resp_status.code_200.count

If we apply templates like this:

templates = [*.*.* application.measurement.field,
*.*.*.* application.measurement.measurement.field,
*.*.*.*.* application.measurement.measurement.measurement.field,
*.*.*.*.*.* application.measurement.measurement.measurement.measurement.field]

We get results like this (working as expected):

api_user_signup_invite_count,application=grafana,environment=local,host=default-centos-vagrant value=0 1484061499000000000
api_dashboard_save_count,application=grafana,environment=local,host=default-centos-vagrant value=0 1484061499000000000

If we apply templates like this:

templates = [grafana.alerting.notifications_sent.* application.measurement.measurement.type.field,
grafana.alerting.result.* application.measurement.measurement.state.field,
grafana.api.login.* application.measurement.measurement.type.field,
grafana.api.resp_status.* application.measurement.measurement.code.field,
grafana.page.resp_status.* application.measurement.measurement.code.field,
grafana.proxy.resp_status.* application.measurement.measurement.code.field]

It also works as expected.

However, if we combine the two approaches to try to map more specifically first and then catch all others, like this:

templates = [grafana.alerting.notifications_sent.* application.measurement.measurement.type.field,
grafana.alerting.result.* application.measurement.measurement.state.field,
grafana.api.login.* application.measurement.measurement.type.field,
grafana.api.resp_status.* application.measurement.measurement.code.field,
grafana.page.resp_status.* application.measurement.measurement.code.field,
grafana.proxy.resp_status.* application.measurement.measurement.code.field,
*.*.* application.measurement.field,
*.*.*.* application.measurement.measurement.field,
*.*.*.*.* application.measurement.measurement.measurement.field,
*.*.*.*.*.* application.measurement.measurement.measurement.measurement.field]

It fails to match on any of the bottom four templates and we get results like this:

page_resp_status,application=grafana,code=code_unknown,environment=local,host=default-centos-vagrant count=0 1484061499000000000
api_resp_status,application=grafana,code=code_404,environment=local,host=default-centos-vagrant count=0 1484061499000000000
proxy_resp_status,application=grafana,code=code_unknown,environment=local,host=default-centos-vagrant count=0 1484061499000000000
grafana_api_user_signup_invite_count,environment=local,host=default-centos-vagrant value=0 1484061499000000000
grafana_api_dashboard_save_count,environment=local,host=default-centos-vagrant value=0 1484061499000000000
grafana_api_dashboard_save_max,environment=local,host=default-centos-vagrant value=0 1484061499000000000

It appears to just be applying a default measurement* filter at the end.

We also tried these formats, which also did not work:

grafana.*.*.*.* application.measurement.measurement.measurement.field
grafana.* application.measurement*

application.measurement* works if applied after the more specific filters. However, we lose some of the granularity that we would like to use (specifically with mapping fields).

Expected behavior:

Ideally, we would be able to combine the two formats:

grafana.proxy.resp_status.* application.measurement.measurement.code.field,
*.*.* application.measurement.field

and end up with results where grafana is not mapped to measurement and mapped instead to application.

Actual behavior:

See Steps to Reproduce

Additional info:

As a side note, #1940 could mitigate this issue for our use case.

bug need more info

Most helpful comment

yes, got it, thanks @codylewandowski

All 6 comments

+1

@codylewandowski Thank you for the information, but can you try to give a smaller and more succinct use-case? There seems to be a lot of extraneous information in your examples

@sparrc Thanks, yes, I went a little overboard in my initial comment.

If I start with graphite metric names like this:

myapp.abc.red.sum
myapp.abc.blue.sum
myapp.dogs.playing.count

I want to get to influx metrics that end up like:

abc application=myapp, color=red, field=sum  # where abc is the metric name and application and color are tags
abc application=myapp, color=blue, field=sum
dogs-playing application=myapp, field=count

In this case, I'd create the templates of:

myapp.abc.* application.measurement.color.field # This should match the first two metrics
*.*.*.* application.measurement.measurement.field # This should match everything else that is made up of 4 blocks

However, we're seeing that while both of those templates work as expected when applied individually (or with other like templates), the all asterisks template stops working when applied with any other non-all asterisks template.

As a result, I end up with metrics that look like:

abc application=myapp, color=red, field=sum 
abc application=myapp, color=blue, field=sum
myapp-dogs-playing-count field=value

So it appears that the second template stops matching and the plugin falls back to just mapping all of the blocks to measurement if they doesn't match a template with strings.
Does this better demonstrate the issue we're seeing? It's kind of a complex scenario to describe.

yes, got it, thanks @codylewandowski

I'm also struggling with exactly this same issue, although with influxdb's graphite receiver. I assume both use the same code.

For example, if I have some of the form:

system.*.loadavg measurement.host.measurement.field

and then place:

system.foo.sensors.soc.thermal.level measurement.host.measurement.device.measurement

into the template - it doesn't matter where, the former completely stops working, and my default "measurement*" gets used. It doesn't matter if the graphite metric starts with "system.foo" or not. The default gets used unless it's exactly "system.foo.sensors.soc.thermal.level".
This is very counter-intuitive!

@codylewandowski @rmk92 It's possible this issue was fixed in https://github.com/influxdata/telegraf/issues/5894, which was included in 1.11.3, would you be able test it?

@rmk92 The Telegraf code is based on InfluxDB, but it's not shared code.

Was this page helpful?
0 / 5 - 0 ratings