Telegraf: BUG: High Cpu Usage [futex]

Created on 15 Mar 2017 · 9Comments · Source: influxdata/telegraf

OS: CentOS and Ubuntu (especially Ubuntu & Debian)
Telegraf: Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff)

default

System calls of telegraf:
screen shot 2017-03-15 at 09 55 29

If the cpu usage is so high, we can't use it in the product environment... What a pity!!!!!!!!

Anybody can solve it?

need more info

Source

lin-credible

All 9 comments

My telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = true
  logfile = ""
  hostname = "_HOSTNAME_"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://_INFLUXIP_:_INFLUXPORT_"]
  database = "telegraf"
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.netstat]]

sakana-kakarityou-dairi_26

lin-credible on 15 Mar 2017

@lin-credible Can you determine which plugin is causing the most cpu_usage?

danielnelson on 15 Mar 2017

@danielnelson I'm not sure which plugin cause the most cpu_usage. Do you have some advices how to collect those informations? As I know, what I used is very basic...

lin-credible on 16 Mar 2017

I would remove inputs one at a time until the problem goes away.

danielnelson on 16 Mar 2017

👍1

collectd's system calls:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.77    0.011868        3956         3           nanosleep
  4.23    0.000524         524         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.012392                     4           total

Here is the difference.

lin-credible on 17 Mar 2017

Anybody can solve it?

The best way we can solve it is by determining the plugin that is causing the most load, and then investigating how we can optimize it. Likely, we will not be able to match collectd's performance but I'm sure there is room for improvement. Are you able to investigate which plugin is causing the biggest problem?

danielnelson on 17 Mar 2017

I'm going to do it... Wait a while, maybe a few days... @danielnelson 😊

lin-credible on 18 Mar 2017

👍1

Dear all,

Not sure if it's related but I found that the inputs.processes plugin is consuming 1-1.5% alone (using collection_jitter = "3s") more on a host using ZFS (which is the only difference between those two host, even the hardware is the same) :

screen shot 2017-04-10 at 11 40 25