Telegraf: Add Ceph Cluster Performance Statistics

Created on 18 Jul 2016 · 5Comments · Source: influxdata/telegraf

Feature Request

Add Ceph Cluster Performance Statistics

Proposal:

Current behavior:

The ceph plugin is only concerned with a small set of data available from the admin socket.

Desired behavior:

Things like ceph status, ceph df and ceph osd pg stat gives a far richer set of performance metrics, e.g. placement group states, IOPs/read/writes on a global and per pool basis.

Use case: [Why is this important (helps with prioritizing requests)]

We rely on this for road map capacity planning, maintenance window schedule planning, performance profiling, spotting errors etc.

What we need

Source

spjmurray

All 5 comments

FYI that image was a live capture of a real CRUSH map update, and exactly what we want to see!

spjmurray on 18 Jul 2016

👍1

I made a similar thing here if it is of use: https://github.com/Buhrietoe/ceph-metrics
It just uses the exec plugin. We use it here in production in a container. It can be easily extended and supports multiple clusters (although serially). Dump mycluster.conf and mycluster.keyring into /etc/ceph/clusters/

Buhrietoe on 18 Jul 2016

👍1

Using the supported ceph python library really makes things a lot easier on the implementation side. It would be nice if someone could do a pure go implementation but @Buhrietoe's solution is completely functional in the meantime.