Sidekiq: Sidekiq jobs getting slower and slower

Created on 15 Mar 2016 · 5Comments · Source: mperham/sidekiq

Hi
I have a very weird problem - we are processing thousands jobs per hour. The main issue is that the sidekiq jobs are getting slower due the time. It starts from 10 seconds and without the restart (for example every hour) the job time could last for example 2 hours.
Have anyone discovered similar issue?
We've got our custom servers so there is also a possiblity that the issue related to the wrong server config

Source

lemoniak2

Most helpful comment

We had the same problem with postgres connections where I work. The Postgres client library has configurable keepalive settings. Keeping a bit of traffic on the socket prevents the load balancer from dropping idle connections. Perhaps MySQL has a similar option.

We put something like this in database.yml:

production: 
  ... 
  variables: 
    tcp_keepalives_idle: 60
    tcp_keepalives_interval: 1 
    tcp_keepalives_count: 3

mikegee on 4 Apr 2016

👍2

All 5 comments

One other person mentioned this recently but I don't have any detail. You'll need to profile the system to understand what part is slow. Are you running out of memory and swapping?

mperham on 15 Mar 2016

Memory and processor usage is not changing it looks the same

lemoniak2 on 16 Mar 2016

Hello, I recently experienced the same issue. Jobs that usually take 1s to complete, suddenly slowed down to ~2700-3000s to complete. I put my detail on S.O question: http://stackoverflow.com/questions/36331456/sidekiq-workers-suddenly-slowed-down-almost-like-stuck-workers, I'll post it again here.

Here's my setup:

Rails v4.2.5.1, with ActiveJob.
MySQL DB, clustered (with 3 masters)
ActiveRecord::Base.connection_pool set to 32 (verified in Sidekiq process as well).
2 sidekiq workers, 3 threads per worker (total 6).

Symptons:

If the workers just got restarted, they process the jobs fast (~1s).
After several jobs processed, the time needed to complete a job (the same job that previously take only ~1s to complete) suddenly spiked to ~2900s, which make the worker look like stuck.
The slows down affect any kind of jobs (there's no specific offending job).
CPU usage and Memory consumption is normal (idle) and no swap either.

Here is the TTIN log. It seems like the process hung when:

retrieve_connection
clear_active_connections

But I'm not sure why it is happening. I searched around the net and found a similar discussion here: https://groups.google.com/forum/#!topic/sidekiq/_eFQGtAWm6E, however I'm unable to understand the cause of the issue.

Any idea why this is happening? And also thanks in advance for any help.

erwinl on 31 Mar 2016

👍1

So we tracked down the issue and found that it was caused by our DB connection configuration.

We are using a load-balanced DB server like this Sidekiq workers => Load Balancer => DB clusters. Changing the config and make the workers to communicate directly with the DB (bypassing the load balancer) solve the issue.

We'll look deeper on why it is acting up if the DB connection is done via the load balancer, but this is not a sidekiq issue.

erwinl on 4 Apr 2016

We put something like this in database.yml:

production: 
  ... 
  variables: 
    tcp_keepalives_idle: 60
    tcp_keepalives_interval: 1 
    tcp_keepalives_count: 3

mikegee on 4 Apr 2016

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Redesign reliable_fetch

mperham · 4Comments

Potential bug in superfetch?

michaeldiscala · 4Comments

Periodic jobs: adding arguments

agrobbin · 4Comments

PG::UnableToSend: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

igorkasyanchuk · 3Comments

Can a job be executed twice at the same time?

edgarjs · 3Comments