Sidekiq: Sidekiq jobs getting slower and slower

Created on 15 Mar 2016  路  5Comments  路  Source: mperham/sidekiq

Hi
I have a very weird problem - we are processing thousands jobs per hour. The main issue is that the sidekiq jobs are getting slower due the time. It starts from 10 seconds and without the restart (for example every hour) the job time could last for example 2 hours.
Have anyone discovered similar issue?
We've got our custom servers so there is also a possiblity that the issue related to the wrong server config

Most helpful comment

We had the same problem with postgres connections where I work. The Postgres client library has configurable keepalive settings. Keeping a bit of traffic on the socket prevents the load balancer from dropping idle connections. Perhaps MySQL has a similar option.

We put something like this in database.yml:

production: 
  ... 
  variables: 
    tcp_keepalives_idle: 60
    tcp_keepalives_interval: 1 
    tcp_keepalives_count: 3

All 5 comments

One other person mentioned this recently but I don't have any detail. You'll need to profile the system to understand what part is slow. Are you running out of memory and swapping?

Memory and processor usage is not changing it looks the same

Hello, I recently experienced the same issue. Jobs that usually take 1s to complete, suddenly slowed down to ~2700-3000s to complete. I put my detail on S.O question: http://stackoverflow.com/questions/36331456/sidekiq-workers-suddenly-slowed-down-almost-like-stuck-workers, I'll post it again here.

Here's my setup:

  • Rails v4.2.5.1, with ActiveJob.
  • MySQL DB, clustered (with 3 masters)
  • ActiveRecord::Base.connection_pool set to 32 (verified in Sidekiq process as well).
  • 2 sidekiq workers, 3 threads per worker (total 6).

Symptons:

  • If the workers just got restarted, they process the jobs fast (~1s).
  • After several jobs processed, the time needed to complete a job (the same job that previously take only ~1s to complete) suddenly spiked to ~2900s, which make the worker look like stuck.
  • The slows down affect any kind of jobs (there's no specific offending job).
  • CPU usage and Memory consumption is normal (idle) and no swap either.

Here is the TTIN log. It seems like the process hung when:

  • retrieve_connection
  • clear_active_connections

But I'm not sure why it is happening. I searched around the net and found a similar discussion here: https://groups.google.com/forum/#!topic/sidekiq/_eFQGtAWm6E, however I'm unable to understand the cause of the issue.

Any idea why this is happening? And also thanks in advance for any help.

So we tracked down the issue and found that it was caused by our DB connection configuration.

We are using a load-balanced DB server like this Sidekiq workers => Load Balancer => DB clusters. Changing the config and make the workers to communicate directly with the DB (bypassing the load balancer) solve the issue.

We'll look deeper on why it is acting up if the DB connection is done via the load balancer, but this is not a sidekiq issue.

We had the same problem with postgres connections where I work. The Postgres client library has configurable keepalive settings. Keeping a bit of traffic on the socket prevents the load balancer from dropping idle connections. Perhaps MySQL has a similar option.

We put something like this in database.yml:

production: 
  ... 
  variables: 
    tcp_keepalives_idle: 60
    tcp_keepalives_interval: 1 
    tcp_keepalives_count: 3
Was this page helpful?
0 / 5 - 0 ratings