Telegraf: Couchbase Plugin Enhancement

Created on 15 May 2018  路  12Comments  路  Source: influxdata/telegraf

Feature Request

Couchbase Plugin Enhancement to include key metrics to monitor couchbase bucket performance.

Opening a feature request kicks off a discussion.

Proposal:

Few of the key metrics missing in the default couchbase plugin are
(Full list of requested metrics for Couchbase plugin - _edited Oct 27, 2020 by @sjwang90_):

vb_active_resident_items_ratio
ep_queue_size
ep_cache_miss_rate
ep_tmp_oom_errors
ep_dcp_xdcr_items_remaining
couch_docs_fragmentation
query-requests_1000ms

gets per seconds
sets per seconds
deletes per seconds
active resident ratio
inbound xdcr ops/sec
outbound xdcr mutations
docs fragmentation %
connections
cluster node count

couch_docs_actual_disk_size
ep_cache_miss_rate
couch_docs_fragmentation
ep_bg_fetched
vb_active_resident_items_ratio
ep_queue_size
vb_num_eject_replicas
ep_warmup_value_count
vb_active_curr_items
ep_cache_miss_rate
couch_docs_fragmentation
ep_queue_size
vb_active_resident_items_ratio
curr_connections
curr_items_tot
ep_bg_fetched
ep_diskqueue_drain
ep_diskqueue_fill
vb_replica_eject
ep_oom_errors
ep_queue_size
ep_tmp_oom_errors
[etc...]

References:
https://blog.couchbase.com/monitoring-couchbase-cluster/

Current behavior:

Only a handful of metrics are being monitored currently.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/couchbase

Fields:
quota_percent_used (unit: percent, example: 68.85424936294555)
ops_per_sec (unit: count, example: 5686.789686789687)
disk_fetches (unit: count, example: 0.0)
item_count (unit: count, example: 943239752.0)
disk_used (unit: bytes, example: 409178772321.0)
data_used (unit: bytes, example: 212179309111.0)
mem_used (unit: bytes, example: 202156957464.0)

Desired behavior:

Need additional metrics to effectively monitor Couchbase.

Use case: [Why is this important (helps with prioritizing requests)]

There are heavy users of Couchbase growing rapidly. IT would be very helpful, if we can improve on this.

arecouchbase feature request good first issue

Most helpful comment

I'll give this a try.

Update 1: Looks like it's complex than it looks. Couchbase spread metrics over time, that means the API will give 60 data point per metric rather than one, these 60 data points are per one second for a minute. Based on the metric, we need to average in some cases and sum it in some cases. Though the API response interval can be configured by a zoom parameter, this gets tricky as the global interval configuration of telegraf can be different than the zoom configuration. I'm dropping this for now.

All 12 comments

@DharanDP thanks for reporting. we have a lot of items to tackle so we might not get to this for a while. Seems like it would be pretty easy to add them here: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/couchbase/couchbase.go#L92

interested in taking a crack and submitting a PR?

I'll give this a try.

Update 1: Looks like it's complex than it looks. Couchbase spread metrics over time, that means the API will give 60 data point per metric rather than one, these 60 data points are per one second for a minute. Based on the metric, we need to average in some cases and sum it in some cases. Though the API response interval can be configured by a zoom parameter, this gets tricky as the global interval configuration of telegraf can be different than the zoom configuration. I'm dropping this for now.

Feature Request

Extend fields for Couchbase Input.

Proposal:

Based on the blogpost by Couchbase for monitoring nodes/clusters: https://blog.couchbase.com/monitoring-couchbase-cluster/
I would like to ask if we could extend the fields to what couchbsae recommends to be monitoried.

Current behavior:

just a few fields are available currently:

memory_total
quota_percent_used
ops_per_sec
disk_fetches
item_count
disk_used
data_used
mem_used

Desired behavior:

Recommended to monitor by Couchbase

current:
memory_free
memory_total
quota_percent_used
ops_per_sec
disk_fetches
item_count
disk_used
data_used
mem_used

+ Extensions:
couch_docs_actual_disk_size
ep_cache_miss_rate
couch_docs_fragmentation
ep_bg_fetched
vb_active_resident_items_ratio
ep_queue_size
vb_num_eject_replicas
ep_warmup_value_count
vb_active_curr_items
ep_cache_miss_rate
couch_docs_fragmentation
ep_queue_size
vb_active_resident_items_ratio
curr_connections
curr_items_tot
ep_bg_fetched
ep_diskqueue_drain
ep_diskqueue_fill
vb_replica_eject
ep_oom_errors
ep_queue_size
ep_tmp_oom_errors
[etc...]

Use case:

  1. Running applications which required a couchbase to run. Especially on migration we are in need to monitor all of these statistics on a central dashboard.

Could someone tell me if we can hope/expect any progress here in the near future?

This isn't very high on my list of issues, but I could help if someone from the community is willing to do the work.

I'm interested in helping with telegraf and would like to take a look at this.

It looks like the plugin is currently using an older unofficial library for couchbase. I haven't been able to find anything showing that the older library supports these additional stats.

There is a newer official library that might be needed to get the additional stats.

Thanks for the help @nwneisen. Long ago I tried to switch to gocb, in issue #2418, but ran into some issues with supporting our current set of metrics. The upstream issue is still not closed, but maybe it is actually fixed and just not marked? Would be much appreciated if you could check.

Thanks for the info @danielnelson. I'll take a look.

@danielnelson The issue is still unresolved.

@danielnelson Any progress here?

FYI @ssoroka

Looks like the upstream issue was resolved. If @nwneisen or anyone else is interested in working on #2418 first, then getting these additional metrics into the plugin can be done hopefully pretty seamlessly.

Was this page helpful?
0 / 5 - 0 ratings