Ruby version: 2.4.7
Sidekiq / Pro / Enterprise version(s): 5.2.5 / 4.0.4 / 1.8.1
Sidekiq.configure_server do |config|
Sidekiq::Pro.dogstatsd = ->{ CustomMetric.dogstatsd } # we have a dogstatsd instance we use for all metrics so we pass it to Sidekiq too
config.server_middleware do |chain|
require 'sidekiq/middleware/server/statsd'
chain.add Sidekiq::Middleware::Server::Statsd
end
logging_frequency_seconds = 30
config.retain_history(logging_frequency_seconds)
end
Sidekiq currently has duplicate metrics for jobs.count, jobs.failure, jobs.success with a worker tag for each. We also have metrics for jobs.<worker_name>.count|failure|success. Outside of this, we have jobs.<worker_name>.perform metrics which we don't have in the more general jobs.perform style like count/failure/success above. I'd like to propose a new design.
jobs.<worker_name>.<metric> metricsjobs.count|failure|successjobs.perform related metricsBenefits of above approach:
jobs.perform metric where users can use that data to find poorly written workers, notice worker time spikes (potentially due to external or internal services timing out, etc.)Are you using an old version? Yes
Have you checked the changelogs to see if your issue has been fixed in a later version? Yes
This is a good issue but it likely requires a major version bump unless we make it opt-in.
On Nov 13, 2019, at 20:00, Nakul Pathak notifications@github.com wrote:
Ruby version: 2.4.7
Sidekiq / Pro / Enterprise version(s): 5.2.5 / 4.0.4 / 1.8.1Sidekiq.configure_server do |config|
Sidekiq::Pro.dogstatsd = ->{ CustomMetric.dogstatsd } # we have a dogstatsd instance we use for all metrics so we pass it to Sidekiq too
config.server_middleware do |chain|
require 'sidekiq/middleware/server/statsd'
chain.add Sidekiq::Middleware::Server::Statsd
endlogging_frequency_seconds = 30
config.retain_history(logging_frequency_seconds)
end
Sidekiq currently has duplicate metrics for jobs.count, jobs.failure, jobs.success with a worker tag for each. We also have metrics for jobs..count|failure|success. Outside of this, we have jobs. .processing_time metrics which we don't have in the more general jobs.perform style like count/failure/success above. I'd like to propose a new design. Remove all jobs.
. metrics
Keep jobs.count|failure|success
To this add, jobs.processing_time related metrics
Both of the above should have worker tags
Benefits of above approach:Redundancy removal. This simplifies understanding, finding, and graphing metrics. It also reduces the metrics footprint of Sidekiq which can have implications on cost depending on # of workers/plan you pay for/etc.
It unlocks more global graphs with the jobs.processing_time metric where users can use that data to find poorly written workers, notice worker time spikes (potentially due to external or internal services timing out, etc.)
Are you using an old version? Yes
Have you checked the changelogs to see if your issue has been fixed in a later version? Yes—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Great idea.
Same goes for Ent Historical Metrics https://github.com/mperham/sidekiq/wiki/Ent-Historical-Metrics
The sidekiq.enqueued.#{x} - Current Size of queue x metrics could be reported as a single metric sidekiq.enqueued with a tag for each queue, following recommendation from https://docs.datadoghq.com/dashboards/faq/when-i-query-can-i-use-wildcards-in-metric-names-and-events/
However I don't think tags are standard to the statsd protocol, so this would be a Datadog (or whoever supports tags) specific feature?
Is there any recommendation to do this today without a major version? Perhaps you could add a way to disable this metric when calling retain_history so we could follow a pattern similar to reporting queue latency, for queue size?
Something like...
config.retain_history(30, except: ['enqueued']) do |s|
s.batch do |b|
%w(penny nickel dime).each do |qname|
q = Sidekiq::Queue.new(qname)
tag_name = "queue:#{q.name}"
b.gauge("sidekiq.queue.latency", q.latency, tags: [tag_name])
b.gauge("sidekiq.queue.size", q.size, tags: [tag_name])
end
end
end
Based on the Ent changelog for 6.1.0/2.1.0, we expected to see metrics reported to Datadog such as:
sidekiq.enqueued.#{name} -> sidekiq.queue.size with tag queue:#{name}
sidekiq.latency.#{name} -> sidekiq.queue.latency with tag queue:#{name}
While we did loose the queue name in the metric, I am not seeing queue tags applied.
Possible bug or is more required to get this change enabled?
We are seeing metrics and tags applied as:
job.WorkerName.count -> job.count with tag worker:WorkerName
job.WorkerName.perform -> job.perform with tag worker:WorkerName
job.WorkerName.failure -> job.failure with tag worker:WorkerName
as expected from the Pro changelog.
stats.queues.each_pair do |q, length|
s.gauge("sidekiq.queue.size", length, queue: q)
end
s.gauge("sidekiq.queue.latency", stats.default_queue_latency, queue: "default")
You can see the queue tag applied right here. I'm not sure why you aren't seeing it.
Ah, it needs to be tags: ["queue:default"], etc. My bug.
ruby
stats.queues.each_pair do |q, length|
s.gauge("sidekiq.queue.size", length, tags: ["queue:#{q}"])
end
s.gauge("sidekiq.queue.latency", stats.default_queue_latency, tags: ["queue:default"])
Here's the fix.
Very nice! When will this be released?
You can monkeypatch capture_default if you need a fix immediately. I'll release Ent 2.1.1 Friday along with Sidekiq 6.1.1.
@mperham Sorry to have to bug again. We just deployed 6.1.1 and are still not seeing queue tags assigned to sidekiq.queue.size and sidekiq.queue.latency. We are now seeing both worker and queue tags being assigned to job.count, job.perform and job.failure. From the changelog we should only have worker for these.
Ent 2.1.1 has not been released.
On Jul 22, 2020, at 09:58, Matthew notifications@github.com wrote:
@mperham Sorry to have to bug again. We just deployed 6.1.1 and are still not seeing queue tags assigned to sidekiq.queue.size and sidekiq.queue.latency. We are now seeing both worker and queue tags being assigned to job.count, job.perform and job.failure. From the changelog we should only have worker for these.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Sigh, yeah just noticed that. Changelog has it as being cut. Sorry for the noise.
Fixed, sorry for the confusion!
Released Sidekiq Enterprise 2.1.1
I slipped application preload into 2.1.1 and that distracted me from this promise, hope it was worth the wait!
https://github.com/mperham/sidekiq/wiki/Ent-Multi-Process#application-preload
Thanks for getting the gem out and it does fix the issue, new issue with the metric name tho. Metrics from the Ent integration have include an extra "sidekiq". Datadog requires a "sidekiq" namespace the Sidekiq integration and with the Pro metrics this is fine(no extra sidekiq), but it looks like you're adding the namespace in the metric name. Basically "sidekiq.queue.size" becomes "sidekiq.sidekiq.queue.size". I can create a new issue for this if needed.
All of the history metrics have a sidekiq prefix and have since day one. Do you suggest I remove sidekiq from all here?
```ruby
s.gauge("sidekiq.failed", stats.failed)
s.gauge("sidekiq.processed", stats.processed)
s.gauge("sidekiq.enqueued", stats.enqueued)
s.gauge("sidekiq.retries", stats.retry_size)
s.gauge("sidekiq.dead", stats.dead_size)
s.gauge("sidekiq.scheduled", stats.scheduled_size)
s.gauge("sidekiq.busy", stats.workers_size)
stats.queues.each_pair do |q, length|
s.gauge("sidekiq.queue.size", length, tags: ["queue:#{q}"])
end
s.gauge("sidekiq.queue.latency", stats.default_queue_latency, tags: ["queue:default"])
```
The thinking here is that "jobs" metrics are part of your app metrics whereas the history metrics are more like infrastructure metrics and part of Sidekiq itself, so your namespace should be "appname". Then you will get metrics like:
appname.job.perform
appname.sidekiq.queue.size
etc
In other words, my working assumption is that your statsd namespace is not "sidekiq".
Unsure, but to match what the Pro metrics send, I think so. Confused as to why we were not seeing the double sidekiq until this change tho. Anyhow, if you don't use the "sidekiq" namespace in the Datadog integration all metrics from sidekiq are treated as custom. I think a better approach, that works better with the DD integration at least, would be to send:
appname.job.perform
appname.queue.size
Where appname is not the namespace.
The Datadog integration will append sidekiq, via the namespace. So these become:
sidekiq.appname.job.perform
sidekiq.appname.queue.size
Right now these become:
sidekiq.job.perform
sidekiq.sidekiq.queue.size
And for clarification this is the DD setup I am referencing(https://docs.datadoghq.com/integrations/sidekiq/):
require 'datadog/statsd' # gem 'dogstatsd-ruby'
Sidekiq::Pro.dogstatsd = ->{ Datadog::Statsd.new('metrics.example.com', 8125, namespace:'sidekiq') }
You must use namespace:'sidekiq' here.
Sounds good. I can remove the sidekiq prefix in 2.2. I don't want to do it in a patch release like 2.1.2.
Do you want to open a new issue so you can track it and I can link the issue in the changelog?
You must use namespace:'sidekiq' here.
@mcg where did you get this requirement from? We use the same integration without that namespace and it works fine.
@nakulpathak3 From Datadog support. It “works” without the namespace, but the metrics will be billed as custom. Are your metrics showing up as custom?