Teletgraf V1.4.4
Ubuntu Server 17.10 (fresh installation)
All fields on the dashboard should populate with relevant data collected by telegraf and stored in influx database named "telegraf"
All fields on the dashboard EXCEPT those relating to [inputs.netstat] populate fine,
telegraf log reports:
2017-11-17T14:17:56Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1593/fd: no such file or directory
2017-11-17T14:18:06Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1603/fd: no such file or directory
2017-11-17T14:18:31Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1634/fd: no such file or directory
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
hostname = ""
omit_hostname = false
### OUTPUT
# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
urls = ["http://192.168.2.180:8086"]
database = "telegraf"
## Retention policy to write to. Empty string writes to the default rp.
retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
write_consistency = "any"
## Write timeout (for the InfluxDB client), formatted as a string.
## If not provided, will default to 5s. 0s means no timeout (not recommended).
timeout = "5s"
# username = "telegraf"
# password = "2bmpiIeSWd63a7ew"
## Set the user agent for HTTP POSTs (can be useful for log differentiation)
# user_agent = "telegraf"
## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
# udp_payload = 512
# Read metrics about cpu usage
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## Comment this line if you want the raw CPU time metrics
fielddrop = ["time_*"]
# Read metrics about disk usage by mount point
[[inputs.disk]]
## By default, telegraf gather stats for all mountpoints.
## Setting mountpoints will restrict the stats to the specified mountpoints.
# mount_points = ["/"]
## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
## present on /run, /var/run, /dev/shm or /dev).
ignore_fs = ["tmpfs", "devtmpfs"]
# Read metrics about disk IO by device
[[inputs.diskio]]
## By default, telegraf will gather stats for all devices including
## disk partitions.
## Setting devices will restrict the stats to the specified devices.
# devices = ["sda", "sdb"]
## Uncomment the following line if you need disk serial numbers.
# skip_serial_number = false
# Get kernel statistics from /proc/stat
[[inputs.kernel]]
# no configuration
# Read metrics about memory usage
[[inputs.mem]]
# no configuration
# Get the number of processes and group them by status
[[inputs.processes]]
# no configuration
# Read metrics about swap memory usage
[[inputs.swap]]
# no configuration
# Read metrics about system load & uptime
[[inputs.system]]
# no configuration
# Read metrics about network interface usage
[[inputs.net]]
# collect data only about specific interfaces
# interfaces = ["eth0"]
[[inputs.netstat]]
# no configuration
[[inputs.interrupts]]
# no configuration
[[inputs.linux_sysctl_fs]]
# no configuration
I think this may happen if a process exits during the collection, is it ever able to collect the metrics successfully?
I opened this pull request which will skip over processes that have exited: https://github.com/shirou/gopsutil/pull/458
I think this may happen if a process exits during the collection, is it ever able to collect the metrics successfully?
No, I don't believe so. The series/measurements that don't show up on the graphs (TcpExtTCPAbortOnClose, TcpExtSyncookiesFailed, gather_errors etc) are never added to the influxdb.
I get the same issue now:
2017-11-19T13:53:50Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
Using Telegraf v1.5.0~112955a9 (git: master 112955a9)
(I am new to Telegraf)
@pawal Are you able to compile with gopsutil 384a55110aa5ae052eb93ea94940548c1e305a99 and check if the error remains?
Same problem with Telegraf v1.4.4 (git: release-1.4 ddcb93188f3fb5393d811cdd742bfe6ec799eba9) on Ubuntu 16.04.3 LTS
Downgrading to 1.4.3 and the error is gone
Hm, interesting. We have different versions of Ubuntu on our servers (Ubuntu 16.04.1 LTS, Ubuntu 16.04.2 LTS, Ubuntu 16.04.3 LTS).
The problem occurs only on Ubuntu 16.04.3 LTS. Don't know if it helps you.
I have two Ubuntu servers, one is running 17.10 and the other 16.04.3. I have this issue on both. Still unresolved. The only other reference to this error message I could locate was here (https://github.com/influxdata/telegraf/issues/3311) it didn't help me but perhaps could offer someone else some insight to a fix.
same with redhat 6.4 and telegraf-1.4.2-1.x86_64:
2018-02-15T15:55:14Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
2018-02-15T15:55:16Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
2018-02-15T15:55:18Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
--edit:
works fine with 1.5.2
@gaetanquentin This is already fixed in 1.4.5 and newer if you can upgrade.
Most helpful comment
Same problem with Telegraf v1.4.4 (git: release-1.4 ddcb93188f3fb5393d811cdd742bfe6ec799eba9) on Ubuntu 16.04.3 LTS