Node_exporter: Add total number of running processes metric

Created on 18 Jan 2018  路  5Comments  路  Source: prometheus/node_exporter

What did you expect to see?

Same metrics as:

# cat /proc/loadavg
0.74 0.50 0.51 2/1215 17143

What did you see instead?

# HELP node_load1 1m load average.
# TYPE node_load1 gauge
node_load1 6.86
# HELP node_load15 15m load average.
# TYPE node_load15 gauge
node_load15 6.53
# HELP node_load5 5m load average.
# TYPE node_load5 gauge
node_load5 7.04
# HELP node_procs_running Number of processes in runnable state.
# TYPE node_procs_running gauge
node_procs_running 14

What is missing?

The total number of running processes.

What advantage would this give.

It would be nice to see the total number of processes so that it would be easier to detect a sudden jump in running processes.
The counter "node_forks" isn't giving a clear indicator when this happens when a system has applications which spawn/kill a lot of child-processes.

accepted enhancement help wanted

Most helpful comment

Yes I think this is all included in #950. If we miss something, we can re-open it!

All 5 comments

We currently expose procs_running and procs_blocked from the contents of /proc/stat. But that file doesn't contain the metric for all process.

It should be simple enough to extract the process count number from /proc/loadavg.

Any update on this?

So, when thinking about implementing this, the following questions come to mind:

  • What should the metric be called? Based on the existing metrics, something like procs_total would sound appropriate. However, I think this is confusing: In reality, The value after the slash is the number of kernel scheduling entities that currently exist on the system. (cf. man 5 proc). In other words, this contains processes and threads. It will not match ps aux | wc -l, but rather ps -eLF | wc -l.
  • What about PR #950? It looks like it introduces exactly the requested metrics ("number of running processes" and "the number from /proc/loadavg"), although by getting the data using some other mechanism.

```$ curl -s localhost:9100/metrics | grep '^node_processes_threads ' && cat /proc/loadavg && ps -eFL | wc -l
node_processes_threads 459
0.21 0.42 0.42 1/458 6143
460


$ curl -s localhost:9100/metrics | grep '^node_processes_pids ' && ps aux | wc -l
node_processes_pids 159
160
```

(Yes, in both examples the numbers are off by one -- probably due to curl and/or wc running in parallel)

@Lemmy: So, maybe #950 already solves all your needs (once merged)?

Ah, I didn't see #950. That does indeed look like what I'm looking for, and I think this could be closed in favour of #950. Thanks!

Yes I think this is all included in #950. If we miss something, we can re-open it!

Was this page helpful?
0 / 5 - 0 ratings