Telegraf: Merge Aggregator not functioning as should?

Created on 23 Jun 2020 · 6Comments · Source: influxdata/telegraf

Relevant telegraf.conf:

[[aggregators.merge]]
  ## If true, the original metric will be dropped by the
  ## aggregator and will not get sent to the output plugins.
  drop_original = true

System Info

Telegraf 1.14.4, Ubuntu 18.04.4 LTS

Steps to reproduce:

Ensure that fields share the same measurement name, tag set and timestamp, and then enable Merge Aggregator plugin.

Expected behavior:

Data returned should group fields into the same message like so :
``` {
"fields":{
"out_unicast_packets":0,
"out_octets":0,
"in_octets":0,
"in_discards":0,
"out_discards":0
},
"name":"All ports out-unicast-packets",
"tags":{
"host":"dv-telegraf",
"path":"/state/port/ethernet/statistics",
"port_id":"6/1/1",
"source":"Device IP"
},
"timestamp": X
}


### Actual behavior:

Although all measurement names, tag set and timestamp match, telegraf still generating one metric field per messages. Like so :

{"fields":{"out_discards":0},"name":"All ports","tags":{"host":"dv-telegraf","path":"/state/port/statistics","port":"6/1/1","port_id":"6/1/1","source":"172.25.54.173"},"timestamp":1592915515}

{"fields":{"in_unknown_protocol_discards":0},"name":"All ports","tags":{"host":"dv-telegraf","path":"/state/port/statistics","port":"6/1/1","port_id":"6/1/1","source":"172.25.54.173"},"timestamp":1592915515}

{"fields":{"in_discards":0},"name":"All ports","tags":{"host":"dv-telegraf","path":"/state/port/statistics","port":"6/1/1","port_id":"6/1/1","source":"172.25.54.173"},"timestamp":1592915515}

### Additional info:
Example Path of Data Collected

[[inputs.cisco_telemetry_gnmi.subscription]]
name = "All ports"
origin = "NokiaState"
path = "/state/port[port-id=6/1/1]/ethernet/statistics/out-octets"
subscription_mode = "sample"
sample_interval = "10s"

**Path tags overridden to be the same..**

[[processors.override]]
namepass = ["All ports"]
[processors.override.tags]
path = "/state/port/statistics"
port = "6/1/1"

**Also Converting some Field Values, all fields should be same datatype so not sure if this makes a difference**

[[processors.converter]]
[processors.converter.tags]
[processors.converter.fields]
float = ["in", "out","max","min" ]
```

bug need more info

Source

TechDawg

All 6 comments

Try adding a printer processor as the last processor by adding order option to each processor and giving the printer the highest number. It may also be helpful to increase the aggregator period, since only metrics received during a single period will be merged.

danielnelson on 23 Jun 2020

Thanks for the response.

Note that I have achieved desired behaviour by collecting all Ethernet statistics, specifying a wildcard port, and larger path (e.g. /state/port[port-id=*]/statistics/), which seems to be working, but not all of these metrics are required.

I have implemented your suggestions above, by changing the aggregate period to 60 (seconds?), and adding printer/order. However, messages are still coming through per metric.

[[aggregators.merge]]
  ## If true, the original metric will be dropped by the
  ## aggregator and will not get sent to the output plugins.
  drop_original = true
  period = 60

Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=in_errors=0 1592928121435739431
Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=in_octets=1547956719 1592928121435743697
Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=in_unicast_packets=1612537 1592928121436123296
Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=out_errors=0 1592928121436126544
Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=out_octets=1597079765 1592928121436486970
Jun 23 17:01:48 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=out_unicast_packets=1612725 1592928121436513301
Jun 23 17:01:49 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=in_discards=0 1592928121672597310
Jun 23 17:01:49 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source= in_unknown_protocol_discards=0 1592928121672635881
Jun 23 17:01:49 dv-telegraf telegraf[32416]: All\ ports,host=dv-telegraf,path=/state/port/statistics,port=6/1/1,port_id=6/1/1,source=out_discards=0 1592928121672808105

TechDawg on 23 Jun 2020

Looks like the issue is that the timestamps aren't exactly the same, due to the way they are set by the cisco_telemetry_gnmi input plugin. I am planning to do https://github.com/influxdata/telegraf/issues/3843 soon which will give us a way to control timestamp rounding on a per input plugin basis, we could use this to gather metrics with the same timestamp, though it is possible that doing so could cause events to be missed if they come in very close to each other.

danielnelson on 23 Jun 2020

👍1

Thanks for the info.

Interesting, is this required for all granular time stamps? e.g. 'period' only facilitates aggregation by seconds, not nanoseconds? Or does cisco_telemetry_gnmi produce a unique timestamp, which is difficult to aggregate?

TechDawg on 24 Jun 2020

You can think of there being 2 types of input plugins: polling and event-driven (In the code we call event-driven variety ServiceInput, you may see that in some documentation). The precision setting only rounds the polling input plugins timestamps, though it is possible for a plugin to override this behavior.

It is important for event-driven input plugins to not have a rounded timestamp, since otherwise it wouldn't be possible to collect more than one metric per period without them overriding each other in the database. This is why the cisco_telemetry_gnmi uses the full nanosecond precision.

danielnelson on 24 Jun 2020

👍1

Ah, I see! Thanks, I really appreciate your explanation, it all makes sense now :)

TechDawg on 25 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings