Telegraf: BUG: High Cpu Usage [futex]

Created on 15 Mar 2017  路  9Comments  路  Source: influxdata/telegraf

OS: CentOS and Ubuntu (especially Ubuntu & Debian)
Telegraf: Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff)

default

System calls of telegraf:
screen shot 2017-03-15 at 09 55 29

If the cpu usage is so high, we can't use it in the product environment... What a pity!!!!!!!!

Anybody can solve it?

need more info

All 9 comments

My telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = true
  logfile = ""
  hostname = "_HOSTNAME_"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://_INFLUXIP_:_INFLUXPORT_"]
  database = "telegraf"
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.netstat]]

sakana-kakarityou-dairi_26

@lin-credible Can you determine which plugin is causing the most cpu_usage?

@danielnelson I'm not sure which plugin cause the most cpu_usage. Do you have some advices how to collect those informations? As I know, what I used is very basic...

I would remove inputs one at a time until the problem goes away.

collectd's system calls:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.77    0.011868        3956         3           nanosleep
  4.23    0.000524         524         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.012392                     4           total

Here is the difference.

Anybody can solve it?

The best way we can solve it is by determining the plugin that is causing the most load, and then investigating how we can optimize it. Likely, we will not be able to match collectd's performance but I'm sure there is room for improvement. Are you able to investigate which plugin is causing the biggest problem?

I'm going to do it... Wait a while, maybe a few days... @danielnelson 馃槉

Dear all,

Not sure if it's related but I found that the inputs.processes plugin is consuming 1-1.5% alone (using collection_jitter = "3s") more on a host using ZFS (which is the only difference between those two host, even the hardware is the same) :

screen shot 2017-04-10 at 11 40 25

screen shot 2017-04-10 at 11 39 18

The difference about the inputs.processes plugin is that on a ZFS host, the number of process is drastically higher :

screen shot 2017-04-10 at 11 49 30

screen shot 2017-04-10 at 11 49 14

Maybe having a look to the inputs.processes plugin is a good place to start.

Thanks a lot for the awesome work though !

@AlbinOS Will you open a new bug issue for this higher than expected cpu use in the processes input plugin?

The rest of this ticket is too general to be actionable, so I am closing it.

Was this page helpful?
0 / 5 - 0 ratings