Telegraf: Error in plugin [inputs.netstat]

Created on 17 Nov 2017 · 10Comments · Source: influxdata/telegraf

Bug report

Relevant telegraf.conf:

System info:

Teletgraf V1.4.4
Ubuntu Server 17.10 (fresh installation)

Steps to reproduce:

Install influxdb (V1.3.7)
Install grafana (V4.6.2)
Import grafana dashboard .JSON from here: https://grafana.com/dashboards/928
Use telegraf.conf from the same link at #3. (change url to point to your influx database)
Restart telegraf.

Expected behavior:

All fields on the dashboard should populate with relevant data collected by telegraf and stored in influx database named "telegraf"

Actual behavior:

All fields on the dashboard EXCEPT those relating to [inputs.netstat] populate fine,

Additional info:

telegraf log reports:

2017-11-17T14:17:56Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1593/fd: no such file or directory
2017-11-17T14:18:06Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1603/fd: no such file or directory
2017-11-17T14:18:31Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0: open /proc/1634/fd: no such file or directory

Config used:

# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"

# Configuration for telegraf agent
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  hostname = ""
  omit_hostname = false

### OUTPUT

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  urls = ["http://192.168.2.180:8086"]
  database = "telegraf"

  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"

  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s"
  # username = "telegraf"
  # password = "2bmpiIeSWd63a7ew"
  ## Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"
  ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512

# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## Comment this line if you want the raw CPU time metrics
  fielddrop = ["time_*"]

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]

  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs"]

# Read metrics about disk IO by device
[[inputs.diskio]]
  ## By default, telegraf will gather stats for all devices including
  ## disk partitions.
  ## Setting devices will restrict the stats to the specified devices.
  # devices = ["sda", "sdb"]
  ## Uncomment the following line if you need disk serial numbers.
  # skip_serial_number = false

# Get kernel statistics from /proc/stat
[[inputs.kernel]]
  # no configuration

# Read metrics about memory usage
[[inputs.mem]]
  # no configuration

# Get the number of processes and group them by status
[[inputs.processes]]
  # no configuration

# Read metrics about swap memory usage
[[inputs.swap]]
  # no configuration

# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

# Read metrics about network interface usage
[[inputs.net]]
  # collect data only about specific interfaces
  # interfaces = ["eth0"]

[[inputs.netstat]]
  # no configuration

[[inputs.interrupts]]
  # no configuration

[[inputs.linux_sysctl_fs]]
  # no configuration

bug regression upstream

Source

2000jago

Most helpful comment

Same problem with Telegraf v1.4.4 (git: release-1.4 ddcb93188f3fb5393d811cdd742bfe6ec799eba9) on Ubuntu 16.04.3 LTS

Suvitruf on 22 Nov 2017

👍4

All 10 comments

I think this may happen if a process exits during the collection, is it ever able to collect the metrics successfully?

I opened this pull request which will skip over processes that have exited: https://github.com/shirou/gopsutil/pull/458

danielnelson on 17 Nov 2017

I think this may happen if a process exits during the collection, is it ever able to collect the metrics successfully?

No, I don't believe so. The series/measurements that don't show up on the graphs (TcpExtTCPAbortOnClose, TcpExtSyncookiesFailed, gather_errors etc) are never added to the influxdb.

2000jago on 18 Nov 2017

I get the same issue now:

2017-11-19T13:53:50Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0

Using Telegraf v1.5.0~112955a9 (git: master 112955a9)

(I am new to Telegraf)

pawal on 19 Nov 2017

👍1

@pawal Are you able to compile with gopsutil 384a55110aa5ae052eb93ea94940548c1e305a99 and check if the error remains?

danielnelson on 20 Nov 2017

Same problem with Telegraf v1.4.4 (git: release-1.4 ddcb93188f3fb5393d811cdd742bfe6ec799eba9) on Ubuntu 16.04.3 LTS

Suvitruf on 22 Nov 2017

👍4

Downgrading to 1.4.3 and the error is gone

damm on 23 Nov 2017

👍1

Hm, interesting. We have different versions of Ubuntu on our servers (Ubuntu 16.04.1 LTS, Ubuntu 16.04.2 LTS, Ubuntu 16.04.3 LTS).

The problem occurs only on Ubuntu 16.04.3 LTS. Don't know if it helps you.

Suvitruf on 24 Nov 2017

I have two Ubuntu servers, one is running 17.10 and the other 16.04.3. I have this issue on both. Still unresolved. The only other reference to this error message I could locate was here (https://github.com/influxdata/telegraf/issues/3311) it didn't help me but perhaps could offer someone else some insight to a fix.

2000jago on 25 Nov 2017

same with redhat 6.4 and telegraf-1.4.2-1.x86_64:
2018-02-15T15:55:14Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
2018-02-15T15:55:16Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0
2018-02-15T15:55:18Z E! Error in plugin [inputs.netstat]: error getting net connections info: cound not get pid(s), 0

--edit:
works fine with 1.5.2