Telegraf: Array of string options are silently ignored if set to a string

Created on 2 Jun 2020  ·  3Comments  ·  Source: influxdata/telegraf

Apologies if I got something wrong, which might be the case.

Using drop_original = true on some filtered aggregators causes telegraf to discard metrics not being matched by their filters. This question on community.influxdata.com seems to point out that this is not the intended behavior.

I've reproduced the issue on the basicstats and the minmax aggregator.

Relevant telegraf.conf:

I'm currently using this test scenario:

.
├── log.txt
└── telegraf.conf

The telegraf.conf is a simple configuration file with two inputs, a debug output and an aggregator:

[global_tags]

[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "5s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false

[[outputs.file]]
  files = ["stdout"]

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false

[[inputs.logparser]]
  files = ["/etc/telegraf/telegraf.debug_aggregator/log.txt"]
  from_beginning = true

[inputs.logparser.grok]
  patterns = [
    "^id %{NUMBER:id:tag} rev %{NUMBER:rev:int}"
  ]
  measurement = "modsec_rules_hits"
  timezone = 'Local'

[[aggregators.basicstats]]
  namepass = "modsec_rules_hits"
  drop_original = true
  period = "10s"
  grace = "86400s"
  fieldpass = ["rev"]
  stats = ["count"]

The log.txt has been stripped down for simplicity and its contents are as follows:

id 111 rev 2
id 111 rev 2
id 111 rev 2
id 111 rev 2

System info:

  • Telegraf version: 1.14.3 (git: HEAD 1b35d6c2)
  • OS: CentOS7

Steps to reproduce:

To reproduce the issue do these three tests:

  1. Comment the aggregator on the telegraf.conf and get a batch of metrics with telegraf --debug --config telegraf.conf test. Save the results.
  2. Uncomment the aggregator, set up drop_original = false on the config file and get another batch of metrics. Save the results.
  3. Set drop_original = true on the config file and get another batch of metrics. Compare to the previously saved results.

At the end of the issue I'm attaching my results.

Expected behavior:

Unless I've misunderstood this post, the final output should be something like this, as the only dropped metrics are those that are being filtered by the aggregator (in this case, those that have the "modsec_rules_hits" measurement):

modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev_count=3 1591094940000000000
cpu,cpu=cpu-total,host=localhost usage_steal=0,usage_guest=0,usage_user=0.2009040682919928,usage_system=0.5022601707573902,usage_idle=99.04570567582773,usage_nice=0.20090406830569688,usage_irq=0,usage_iowait=0.05022601707642422,usage_softirq=0,usage_guest_nice=0 1591094895000000000
cpu,cpu=cpu3,host=localhost usage_system=0.8048289738468508,usage_irq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_user=0.20120724346628763,usage_idle=98.79275654044665,usage_nice=0.2012072434617127,usage_iowait=0,usage_softirq=0 1591094895000000000
cpu,cpu=cpu2,host=localhost usage_idle=99.79797979778975,usage_softirq=0,usage_steal=0,usage_guest_nice=0,usage_user=0,usage_system=0.20202020202651216,usage_nice=0,usage_iowait=0,usage_irq=0,usage_guest=0 1591094895000000000
cpu,cpu=cpu1,host=localhost usage_system=0.40080160319768315,usage_nice=0.40080160320679636,usage_iowait=0.20040080160453733,usage_softirq=0,usage_guest_nice=0,usage_user=0.20040080158972842,usage_idle=98.79759519159167,usage_irq=0,usage_steal=0,usage_guest=0 1591094895000000000
cpu,cpu=cpu0,host=localhost usage_iowait=0,usage_irq=0,usage_softirq=0,usage_guest=0,usage_user=0.20161290323172149,usage_system=0.2016129032133849,usage_idle=99.39516129210182,usage_nice=0.20161290322713735,usage_steal=0,usage_guest_nice=0 1591094895000000000

Actual behavior:

drop_original = true seems to be discarding all of the metrics, not only those that are being processed by the aggregator. The end result of a batch of metrics is something like this:

modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev_count=3 1591094940000000000

Additional info:

Metrics obtained by the test setup with the aggregator commented out on the config file:

modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094530000000000
modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094525000000000
modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094530000000000
cpu,cpu=cpu-total,host=localhost usage_guest_nice=0,usage_idle=99.2466097443031,usage_steal=0,usage_guest=0,usage_iowait=0,usage_irq=0,usage_softirq=0,usage_user=0.2009040682966916,usage_system=0.35158211954205043,usage_nice=0.2009040683103957 1591094830000000000
cpu,cpu=cpu3,host=localhost usage_nice=0.40160642571333627,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_user=0.20080321284297092,usage_idle=98.79518072407448,usage_irq=0,usage_softirq=0,usage_system=0.6024096385837017,usage_iowait=0 1591094830000000000
cpu,cpu=cpu2,host=localhost usage_user=0.1999999999908323,usage_system=0.1999999999908323,usage_nice=0.2000000000044747,usage_softirq=0,usage_idle=99.20000000156462,usage_iowait=0.2000000000044747,usage_irq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0 1591094830000000000
cpu,cpu=cpu1,host=localhost usage_nice=0.20040080160339818,usage_irq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_system=0.4008016032159095,usage_idle=98.99799599104891,usage_softirq=0,usage_user=0.4008016032159095,usage_iowait=0 1591094830000000000
cpu,cpu=cpu0,host=localhost usage_user=0,usage_idle=99.59758551270143,usage_irq=0,usage_steal=0,usage_guest_nice=0,usage_system=0.40241448691427556,usage_nice=0,usage_iowait=0,usage_softirq=0,usage_guest=0 1591094830000000000

Metrics obtained with the aggregator uncommented but using drop_original = false. Note that the first line is the one being aggregated:

modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev_count=3 1591094890000000000
modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094530000000000
modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094525000000000
modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev=2i 1591094530000000000
cpu,cpu=cpu-total,host=localhost usage_steal=0,usage_guest=0,usage_user=0.2009040682919928,usage_system=0.5022601707573902,usage_idle=99.04570567582773,usage_nice=0.20090406830569688,usage_irq=0,usage_iowait=0.05022601707642422,usage_softirq=0,usage_guest_nice=0 1591094895000000000
cpu,cpu=cpu3,host=localhost usage_system=0.8048289738468508,usage_irq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_user=0.20120724346628763,usage_idle=98.79275654044665,usage_nice=0.2012072434617127,usage_iowait=0,usage_softirq=0 1591094895000000000
cpu,cpu=cpu2,host=localhost usage_idle=99.79797979778975,usage_softirq=0,usage_steal=0,usage_guest_nice=0,usage_user=0,usage_system=0.20202020202651216,usage_nice=0,usage_iowait=0,usage_irq=0,usage_guest=0 1591094895000000000
cpu,cpu=cpu1,host=localhost usage_system=0.40080160319768315,usage_nice=0.40080160320679636,usage_iowait=0.20040080160453733,usage_softirq=0,usage_guest_nice=0,usage_user=0.20040080158972842,usage_idle=98.79759519159167,usage_irq=0,usage_steal=0,usage_guest=0 1591094895000000000
cpu,cpu=cpu0,host=localhost usage_iowait=0,usage_irq=0,usage_softirq=0,usage_guest=0,usage_user=0.20161290323172149,usage_system=0.2016129032133849,usage_idle=99.39516129210182,usage_nice=0.20161290322713735,usage_steal=0,usage_guest_nice=0 1591094895000000000

Metrics obtained with the aggregator uncommented and drop_original = true. Note that the metrics from the cpu input are missing along the original ones processed by the aggregator:

modsec_rules_hits,host=localhost,modsec_rule_id=111,path=/etc/telegraf/telegraf.debug_aggregator/log.txt rev_count=3 1591094940000000000
areconfiguration bug

All 3 comments

Thanks for writing up the case. It took me a bit longer than I'd like to admit to spot the issue, but this is related to a known issue in our TOML parser where it does not properly warn if you pass a string instead of an array of strings.

This causes the namepass option not to be set and so all metrics are matched.

This change to your configuration should take care of it:

[[aggregators.basicstats]]
-  namepass = "modsec_rules_hits"
+  namepass = ["modsec_rules_hits"]

Sorry about the confusion, the related issues are #3444 and #6474 but I'm going to keep this issue open as well since it is manifest in a slightly different way.

That indeed seems to be the issue :woman_facepalming:

Thank you very much for your time and work, Daniel.

Is there any way to detect those kinds of syntax errors from userland?

This issue really needs to be fixed in Telegraf, there isn't any good workaround to detect as it's not a TOML syntax error.

Was this page helpful?
0 / 5 - 0 ratings