Telegraf: Basicstats counter not working as expected

Created on 17 Mar 2019  ·  14Comments  ·  Source: influxdata/telegraf

Description

I've trying to config an easy way to get apache statistics as described in https://github.com/influxdata/telegraf/issues/3991 , but while doing some testing with tail / logparser , I noticed something is not working to me , perhaps there is some missing config ? If not could be becouse of a bug in the agregator (perhaps there is some point filtering before send them to the aggregation process)?

Relevant telegraf.conf:

This is the config to put in /etc/telegraf/telegraf.d/testlog.conf to reproduce

```toml
[[inputs.tail]]
name_override = "apachelog2"
files = ["/var/log/apache2/test.log"]
from_beginning = false
pipe = false
data_format = "grok"
grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]

[[aggregators.basicstats]]
namepass = ["apachelog2"]
period = "60s"
drop_original = true
fieldpass = ["resp_time"]
stats = ["count","max","min","mean"]
````

System info:

bash root@ipasdev01:~# telegraf --version Telegraf 1.10.0 (git: HEAD fe33ee89) root@ipasdev01:~# cat /etc/debian_version buster/sid root@ipasdev01:~# uname -a Linux ipasdev01 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce:

1.- Create the previous config file (testlog.conf) , put in the telegraf.d directory and restart telegraf

2.- Create this script , to simulate a log file ( always with the same input )

````bash

!/bin/bash

while true
do
D=date +"%D-%T"
echo "$D /200_1sec.php 1000000">>/var/log/apache2/test.log
echo "$D /200_1sec.php 1000000">>/var/log/apache2/test.log
echo "$D /200_2sec.php 2000000">>/var/log/apache2/test.log
sleep 2
done
````
Execute in background

nohup ~/telegraf_log_test.sh &

Expected behavior:

I expect to have:

  • 60 times/minute the request 200_1sec.php
  • 30 times/minute the request 200_2sec.php

As counted with grep and wc -l

bash root@ipasdev01:$ grep "19:29:" /var/log/apache2/test.log | grep "200_1sec" | wc -l 60 root@ipasdev01:$ grep "19:29:" /var/log/apache2/test.log | grep "200_2sec" | wc -l 30

Actual behavior:

Basicstats agregator is counting only:

  • 8 times/min the request "200_1sec.php"
  • 4 times/min the request "200_2sec.php"

````bash

select sum(resp_time_count) from apachelog2 where time > now() - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
2019-03-17T18:28:00Z
2019-03-17T18:29:00Z 8
2019-03-17T18:30:00Z 8

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
2019-03-17T18:28:00Z
2019-03-17T18:29:00Z 4
2019-03-17T18:30:00Z 4
````

areagent bug

Most helpful comment

I see what the issue is, working on a fix now.

All 14 comments

I forgot include the complete config and log

Telegraf Full config

/etc/telegraf/telegraf.conf

````toml
[global_tags]

l_tag1 = "a"
l_tag2 = "b"
l_tag3 = "c"

[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = "10s"
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
omit_hostname = false
[[inputs.internal]]
# collect_memstats = true
[inputs.internal.tags]
ifx_db = "internal_telegraf"
[[outputs.influxdb]]
urls = ["http://influxdb.mydomain.org:8086"]
database = "_telegraf"
username = "xxxxxxxxx"
password = "xxxxxxxxx"

#Filter influx output
tagexclude = ["ifx_db"]
[outputs.influxdb.tagpass]
ifx_db = ["internal_telegraf"]
````

/etc/telegraf/telegraf.d/input_apache.conf

toml [[inputs.tail]] name_override = "apachelog2" files = ["/var/log/apache2/test.log"] from_beginning = false pipe = false data_format = "grok" grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"] [inputs.tail.tags] instance="instance1" ifx_db="ws_apache" [[aggregators.basicstats]] namepass = ["apachelog2"] period = "60s" drop_original = true fieldpass = ["resp_time"] stats = ["count","max","min","mean"]

/etc/telegraf/telegraf.d/output_apache.conf

toml [[outputs.influxdb]] urls = ["http://influxdb.mydomain.org:8086"] database = "apache_metrics" username = "xxxxxxx" password = "xxxxxxx" #Filter metrics tagexclude = ["ifx_db"] [outputs.influxdb.tagpass] ifx_db = ["ws_apache"]

Output Log

2019-03-18T05:18:09Z I! Loaded inputs: internal tail 2019-03-18T05:18:09Z I! Loaded aggregators: basicstats 2019-03-18T05:18:09Z I! Loaded processors: 2019-03-18T05:18:09Z I! Loaded outputs: influxdb influxdb 2019-03-18T05:18:09Z I! Tags enabled: host=ipasdev01 l_tag1=a l_tag2=b l_tag3=c 2019-03-18T05:18:09Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ipasdev01", Flush Interval:10s 2019-03-18T05:18:09Z D! [agent] Connecting outputs 2019-03-18T05:18:09Z D! [agent] Attempting connection to output: influxdb 2019-03-18T05:18:09Z D! [agent] Successfully connected to output: influxdb 2019-03-18T05:18:09Z D! [agent] Attempting connection to output: influxdb 2019-03-18T05:18:09Z D! [agent] Successfully connected to output: influxdb 2019-03-18T05:18:09Z D! [agent] Starting service inputs 2019-03-18T05:18:09Z D! [inputs.tail] tail added for file: /var/log/apache2/test.log 2019-03-18T05:18:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:18:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:00Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.384325ms 2019-03-18T05:19:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:19:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 11.926215ms 2019-03-18T05:20:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.559029ms 2019-03-18T05:20:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:20:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 24.70911ms 2019-03-18T05:21:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 28.525874ms 2019-03-18T05:21:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:21:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.273403ms 2019-03-18T05:22:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 15.221005ms 2019-03-18T05:22:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:22:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 11.496482ms 2019-03-18T05:23:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.704723ms 2019-03-18T05:23:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:23:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 12.81991ms 2019-03-18T05:24:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 52.024487ms 2019-03-18T05:24:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:24:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 14.451988ms 2019-03-18T05:25:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 14.894181ms 2019-03-18T05:25:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:25:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.351588ms 2019-03-18T05:26:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 20.242445ms 2019-03-18T05:26:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:26:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 12.704176ms 2019-03-18T05:27:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 17.389107ms 2019-03-18T05:27:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:27:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 10.827497ms 2019-03-18T05:28:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 15.290605ms 2019-03-18T05:28:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:30Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:40Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:28:50Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:00Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:10Z D! [outputs.influxdb] wrote batch of 6 metrics in 14.191643ms 2019-03-18T05:29:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:10Z D! [outputs.influxdb] wrote batch of 2 metrics in 18.257915ms 2019-03-18T05:29:10Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 2019-03-18T05:29:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics.

More interesting data ...

I't seems like counting changes with time..

In the morning...

````

select sum(resp_time_count) from apachelog2 where time > now() - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
2019-03-18T05:29:00Z
2019-03-18T05:30:00Z 10
2019-03-18T05:31:00Z 10

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
2019-03-18T05:29:00Z
2019-03-18T05:30:00Z 5
2019-03-18T05:31:00Z 5
````
Some hours after

````

select sum(resp_time_count) from apachelog2 where time > now() - 2m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
2019-03-18T09:35:00Z
2019-03-18T09:36:00Z 36
2019-03-18T09:37:00Z 36

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
2019-03-18T09:35:00Z
2019-03-18T09:36:00Z 18
2019-03-18T09:37:00Z 18
````

image

@toni-moreno I'm not able to replicate, can you try to remove the tag routing to see if it helps?:

select sum(resp_time_count) from apachelog2 where time > now()  - 4m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time                 sum
----                 ---
2019-03-19T00:00:00Z 
2019-03-19T00:01:00Z 52
2019-03-19T00:02:00Z 60
2019-03-19T00:03:00Z 60
2019-03-19T00:04:00Z 60

name: apachelog2
tags: request=/200_2sec.php
time                 sum
----                 ---
2019-03-19T00:00:00Z 
2019-03-19T00:01:00Z 26
2019-03-19T00:02:00Z 30
2019-03-19T00:03:00Z 30
2019-03-19T00:04:00Z 30

Output with a file output:

apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954029000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954029000000000
apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954089000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954089000000000
apachelog2,host=loaner,path=test.log,request=/200_2sec.php resp_time_count=30,resp_time_max=2000000,resp_time_mean=2000000,resp_time_min=2000000 1552954149000000000
apachelog2,host=loaner,path=test.log,request=/200_1sec.php resp_time_count=60,resp_time_max=1000000,resp_time_mean=1000000,resp_time_min=1000000 1552954149000000000

Hi @danielnelson , I'm working with @toni-moreno and after some tests I have seen the same wrong results as he had described on this issue.

Version: Telegraf 1.10.0 (git: HEAD fe33ee89)
Config: Same
Data: Same script
Routing tag: disabled/enabled, no difference seen

Test:

  • I have done the tests on 2 different hosts with invalid results.
  • In some restarts the values ​​changed, but they were still invalid.

A: Debian VM on VirtualBox:

  • Linux snmpcoldev01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
  • debian_version: 8.4

Result:

> select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m), request;
name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1552992300000000000 
1552992360000000000 24
1552992420000000000 24
1552992480000000000 24
1552992540000000000 24

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1552992300000000000 
1552992360000000000 12
1552992420000000000 12
1552992480000000000 12
1552992540000000000 12

B: Redhat 7.5 VM on VMWare vSphere:

  • Red Hat Enterprise Linux Server release 7.5 (Maipo)

Result:

> select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m), request;
name: apachelog2
tags: request=/200_1sec.php
time                sum
----                ---
1552992060000000000
1552992120000000000 26
1552992180000000000 26
1552992240000000000 26
1552992300000000000 26

name: apachelog2
tags: request=/200_2sec.php
time                sum
----                ---
1552992060000000000
1552992120000000000 13
1552992180000000000 13
1552992240000000000 13
1552992300000000000 13

Can you try using the watch_method = "poll" option to tail and see if it has any effect.

If nothing changes, can you place the 3 log lines into a file and substitute the tail input with the file input:

[[inputs.file]]
  interval = "2s"
  name_override = "apachelog2"
  files = ["test.log"]
  data_format = "grok"
  grok_patterns = ["%{NOTSPACE:time:drop} %{NOTSPACE:request:tag} %{NUMBER:resp_time:int}"]

test.log:

03/18/19-17:00:11 /200_1sec.php 1000000
03/18/19-17:00:11 /200_1sec.php 1000000
03/18/19-17:00:11 /200_2sec.php 2000000

Hi @danielnelson, thanks for the answer!

First of all, on each of the test done, we have check that setting up drop_original to false seems that the log parser retrieved all lines and they had been written OK to the DB.
The problem seems to be related with the aggregator.

Following with your recommendations:

Set up watch_method = poll

Same wrong result as described on the issue.

Test with input.file plugin

Result:

It seems that the results are wrong. We expect to have 30 and 60

````

select sum(resp_time_count) from apachelog2 where time > now() - 4m group by time(1m),request
name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
1553079840000000000
1553079900000000000 20
1553079960000000000 20
1553080020000000000 20
1553080080000000000 20

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
1553079840000000000
1553079900000000000 10
1553079960000000000 10
1553080020000000000 10
1553080080000000000 10
````

More info:

With drop_original = false and with the following query we can see that the points are being read OK from file.
Note: in order to have all points, we changed the precision = 10s to precision = 's'

````

select count(resp_time), sum(resp_time_count) from apachelog2 where time > now() - 3m group by time(1m),request
name: apachelog2
tags: request=/200_1sec.php
time count sum
---- ----- ---
1553079900000000000 6
1553079960000000000 30 20
1553080020000000000 30 20
1553080080000000000 20 20

name: apachelog2
tags: request=/200_2sec.php
time count sum
---- ----- ---
1553079900000000000 6
1553079960000000000 30 10
1553080020000000000 30 10
1553080080000000000 20 10
````

Raw:
```
time request resp_time resp_time_count resp_time_max resp_time_mean resp_time_min
---- ------- --------- --------------- ------------- -------------- -------------
1553080082000000000 /200_1sec.php 1000000
1553080082000000000 /200_2sec.php 2000000
1553080084000000000 /200_1sec.php 1000000
1553080084000000000 /200_2sec.php 2000000
1553080086000000000 /200_1sec.php 1000000
1553080086000000000 /200_2sec.php 2000000
1553080088000000000 /200_1sec.php 1000000
1553080088000000000 /200_2sec.php 2000000
1553080090000000000 /200_1sec.php 1000000
1553080090000000000 /200_2sec.php 2000000
1553080092000000000 /200_1sec.php 1000000
1553080092000000000 /200_2sec.php 2000000
1553080094000000000 /200_1sec.php 1000000
1553080094000000000 /200_2sec.php 2000000
1553080096000000000 /200_1sec.php 1000000
1553080096000000000 /200_2sec.php 2000000
1553080098000000000 /200_1sec.php 1000000
1553080098000000000 /200_2sec.php 2000000
1553080100000000000 /200_1sec.php 1000000
1553080100000000000 /200_2sec.php 2000000
1553080102000000000 /200_1sec.php 1000000
1553080102000000000 /200_2sec.php 2000000
1553080104000000000 /200_1sec.php 1000000
1553080104000000000 /200_2sec.php 2000000
1553080106000000000 /200_1sec.php 1000000
1553080106000000000 /200_2sec.php 2000000
1553080108000000000 /200_1sec.php 1000000
1553080108000000000 /200_2sec.php 2000000
1553080110000000000 /200_1sec.php 1000000
1553080110000000000 /200_2sec.php 2000000
1553080112000000000 /200_1sec.php 1000000
1553080112000000000 /200_2sec.php 2000000
1553080114000000000 /200_1sec.php 1000000
1553080114000000000 /200_2sec.php 2000000
1553080116000000000 /200_1sec.php 1000000
1553080116000000000 /200_2sec.php 2000000
1553080118000000000 /200_1sec.php 1000000
1553080118000000000 /200_2sec.php 2000000
1553080120000000000 /200_1sec.php 1000000
1553080120000000000 /200_2sec.php 2000000
1553080122000000000 /200_1sec.php 1000000
1553080122000000000 /200_2sec.php 2000000
1553080124000000000 /200_1sec.php 1000000
1553080124000000000 /200_2sec.php 2000000
1553080126000000000 /200_1sec.php 1000000
1553080126000000000 /200_2sec.php 2000000
1553080128000000000 /200_1sec.php 1000000
1553080128000000000 /200_2sec.php 2000000
1553080130000000000 /200_1sec.php 1000000
1553080130000000000 /200_2sec.php 2000000
1553080132000000000 /200_1sec.php 1000000
1553080132000000000 /200_2sec.php 2000000
1553080134000000000 /200_1sec.php 1000000
1553080134000000000 /200_2sec.php 2000000
1553080136000000000 /200_1sec.php 1000000
1553080136000000000 /200_2sec.php 2000000
1553080138000000000 /200_1sec.php 1000000
1553080138000000000 /200_2sec.php 2000000
1553080140000000000 /200_1sec.php 1000000 20 1000000 1000000 1000000
1553080140000000000 /200_2sec.php 2000000 10 2000000 2000000 2000000

If you enable the internal plugin what are the metrics_dropped on internal_aggregate?

internal_aggregate,aggregator=basicstats metrics_dropped=0i

Hi @danielnelson ,

We enabled the internal plugin and, as you can see on the following results, seems that:

bash <metrics_dropped> = <expected_value> - <sum(resp_time_count)>

Test with internal

  • We did a restart to see if the metrics_dropped would change
    Expected : 90 T1: 39 = 90 - (34+17) T2: 78 = 90 - (8+4)
  • We did the test with the input_file, with the same config than the above comment

Result

````bash
select non_negative_derivative(last(metrics_dropped),1m) from "internal_aggregate" where time > now() - 20m group by time(1m);
select sum(resp_time_count) from "apachelog2" where time > now() - 20m group by time(1m),request

name: internal_aggregate
time non_negative_derivative
---- -----------------------
1553154840000000000 39
1553154900000000000 39
1553154960000000000 39
1553155020000000000 39
1553155080000000000 39
1553155140000000000 39
1553155200000000000 39
1553155260000000000 39
1553155320000000000 39

1553155500000000000 78
1553155560000000000 78
1553155620000000000 78
1553155680000000000 78
1553155740000000000 78
1553155800000000000 78
1553155860000000000 78

name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
1553154840000000000 34
1553154900000000000 34
1553154960000000000 34
1553155020000000000 34
1553155080000000000 34
1553155140000000000 34
1553155200000000000 34
1553155260000000000 34
1553155320000000000 34

1553155500000000000 8
1553155560000000000 8
1553155620000000000 8
1553155680000000000 8
1553155740000000000 8
1553155800000000000 8
1553155860000000000 8

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
1553154840000000000 17
1553154900000000000 17
1553154960000000000 17
1553155020000000000 17
1553155080000000000 17
1553155140000000000 17
1553155200000000000 17
1553155260000000000 17
1553155320000000000 17

1553155500000000000 4
1553155560000000000 4
1553155620000000000 4
1553155680000000000 4
1553155740000000000 4
1553155800000000000 4
1553155860000000000 4
````

Hi @danielnelson , I'm still working with the tail plugin.

image

This is my count while having 21 dropped metrics

````

select sum(resp_time_count) from apachelog2 where time > now() - 5m group by time(1m),request;
name: apachelog2
tags: request=/200_1sec.php
time sum
---- ---
2019-03-21T10:19:00Z
2019-03-21T10:20:00Z 46
2019-03-21T10:21:00Z 46
2019-03-21T10:22:00Z 46
2019-03-21T10:23:00Z 46
2019-03-21T10:24:00Z 48

name: apachelog2
tags: request=/200_2sec.php
time sum
---- ---
2019-03-21T10:19:00Z
2019-03-21T10:20:00Z 23
2019-03-21T10:21:00Z 23
2019-03-21T10:22:00Z 23
2019-03-21T10:23:00Z 23
2019-03-21T10:24:00Z 24
````

As in the previous @sbengo example with the file input plugin , dropped added with counted are correct total number of lines in the file per minute.

23 + 46 + 21 = 90 metrics/min

Could you explain us why aggregator is dropping metrics, and how can I fix it?

There are 2 possibilities:

  • The selected metric does not have any fields; fieldpass has removed all the fields. I think this is actually a bug and they should be marked as metrics_filtered, but right now they are being marked under metrics_dropped.
  • The metric does not have a timestamp in the current aggregation window running_aggregator.go#L125-L128.

I'm not sure which case we are running into, though the code setting the aggregation window has been problematic, and it could possible have something to do with the agent precision rounding. I can fix the bug where we report metrics as dropped when they should be filtered, and that will help narrow it down.

I see what the issue is, working on a fix now.

Hi @danielnelson any news about the fix ? . do you need any other test we can do?.

When ready we would like to test the fix and validate it if you want.

Thank you again!

I'm also testing aggregators and find that drop_original doesn't work well with max_undelivered_messages. It feels like drop_original = true won't reset some internal counters about how many messages were processed. Maybe these problems share the same root cause?

@toni-moreno I will definitely take you up on that offer to test, but I need a bit more time. Should have the fix ready on Wednesday.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fahimeh2010 picture fahimeh2010  ·  3Comments

corentingi picture corentingi  ·  3Comments

Bregor picture Bregor  ·  3Comments

timhallinflux picture timhallinflux  ·  3Comments

m4ce picture m4ce  ·  3Comments