Versions:
sidekiq (3.2.4)
sidekiq-middleware (0.3.0)
Process seems alive and working:
$ pgrep -fl sidekiq
21956 sidekiq 3.2.4 tradeapp [23 of 55 busy]
But webui states there are 0 busy workers.
sidekiq.yml
:concurrency: 55
staging:
:concurrency: 55
production:
:concurrency: 55
:queues:
- [default, 10]
- [mailer, 5]
- [slow, 1]
This happens once or twice a day. Nothing specific in the logs.
Do you guys have any idea how to tacke this?
Thanks in advance
This happened in previous versions if the heartbeat timer died, usually due to extreme Redis latency. #1884 was fixed in 3.2.2. People are still reporting this so there must still be an issue in there.
Correct me if I'm wrong, but this is not exacly same issue - in my case sidekiq stops processing any new jobs. The "23 of 55" figure from pgrep is invalid. Logs show no activity and jobs are not popped from queues.
Can you get me the TTIN signal log output when a Sidekiq process is dead? Do you see any messages in the logs with the phrase "fetch died"?
No 'fetch died' in logs. Nothing new in logs after kill -TTIN $pid
Thans for help!
Additional info: we run two processes, but they process different queues.
Second process is run with concurrency == 1 and it does not have such a problems.
This problem started to happen when I increased concurrency from 20 to 55.
It's possible you have threads which are starved for connections somehow. I don't recommend concurrency that high, the default 25 is typically all you need. Add more processes if you need more work done in parallel.
Ok will test this, But given only 23 workers are "busy" the rest of threads should be processing jobs?
Can you shut down the process? If you see nothing in your logs upon TTIN, that means you've probably set the Rails log level to higher than :info.
I've got log_level == :info for this given environment.
I've splited this to 2 processes 25 threads each, lets see if this helps.
But it got stuck when only around 10-15 workers were active so dont know if this could be a reason.
Anyway, thanks a lot for you effort. I will update this thread with progress.
What the version of celluloid in your gemfile.lock ? Update to latest version of sidekiq.
Hello,
I think I have the same issue, I have a sidekiq process running but it does not "take" jobs in queue :
Processus:

In queue :

For Information, I have a lot of sidekiq Processus (around 30) and I have around 200 busy jobs at the same time.
Any idea why?
@seuros it happens on 0.15.2 and 0.16.
We deployed it to production and it does not get stuck for 3 days so far (was stuck twice a day on staging). Don't have an idea why.
What did you deployed in production? I am interested by it!
The app I'm working on. We had this problem on staging server but on fresh production one it disappeared.
We have rather minimal setup compared to yours, 2 processes 25concurrency each. I've never saw more that 20 jobs run simultaneously tho.
I meant, which gem did you update to get it work?
The 3.2.5 version of sidekiq just locks celluloid to 0.15.2. This celluloid version you should be using, 0.16 have some locking issues.
You should be using 3.2.5, and kill any remaining manually before redeploying.
You can't shut them correctly if they are running with celluloid 0.16.
@jumski @seuros I am facing a similar issue (https://github.com/mperham/sidekiq/issues/2003) wondering if you managed to find a fix on your end?
it started working on fresh production server, was failing on staging
dont really have any clues :(
2014-10-30 4:31 GMT+01:00 Christian Fazzini [email protected]:
@jumski https://github.com/jumski @seuros https://github.com/seuros I
am facing a similar issue (#2003
https://github.com/mperham/sidekiq/issues/2003) wondering if you
managed to find a fix on your end?—
Reply to this email directly or view it on GitHub
https://github.com/mperham/sidekiq/issues/1963#issuecomment-61041342.
I have this problem too.
If I added custom queue on my worker it is not working.
I have to remove this line of code to make it work
sidekiq_options queue: :billing_notification
I think couple of months ago my code works perfectly, now with all dependencies are the same it does not work.
Here is my gem list:
celluloid (0.15.2)
sidekiq (3.2.1)
Got same issue. I am using sidekiq (3.2.6) and celluloid locked to (0.15.2). In addition I am using sidetiq (0.6.3) for scheduled jobs, is sidetiq the issue?
@kxhitiz Sidetiq randomly stops processing jobs - unrelated entirely to this sidekiq issue. See https://github.com/tobiassvn/sidetiq/issues/116
@mperham We have some networking issues occurring around 4am nightly on a recurring basis - hosting company seems incapable of fixing it alas. Due to this even with 3.2.5 we're seeing the system simply stop processing. Is it possible the timeouts are still too low for the Redis work around you introduced with 3.2.2? We'd preferably like it to just never stop trying :)
@mperham Ignore that last one noticed there is a duplicate of this issue which provides various suggestions. I'll try 3.2.6 / TTIN / resolv-replace. If still broken will comment again.
@lypanov coincidentally, the networking issues you are describing that occur at 3-5am nightly also happen to us, around that time. As in Sidekiq stops processing any jobs and the queue just keeps getting larger, even if the Sidekiq process is still running in the background.
However, our Redis server is hosted on AWS Elastic Cache. Do you know by chance if your hosting company is using AWS as well?
I am referencing the issue I opened. May be related: https://github.com/mperham/sidekiq/issues/2003
@krzkrzkrz With sidetiq IIRC we were seeing it on a nightly basis. Without it happens maybe once every 2 weeks.
Not AWS no.
The latest version is 3.3.3, maybe you should upgrade.
Unfortunately you haven't given us any info to diagnose the problem. We need a thread dump.
On Apr 1, 2015, at 07:44, Jake Hoffner [email protected] wrote:
I am having this issue as well. Jobs stop processing about once a week, typically in the middle of the night. I'm running sidekiq 3.2.6 on Heroku. I even have an autoscaler implemented to help scale dynos if the job queue builds up. What happens is that the main worker will stop processing altogether, the scaled workers will start up and work fine, process the queue, and then when its back to the single dyno jobs stop being processed again. A process restart fixes the issue.
—
Reply to this email directly or view it on GitHub.
I have enqueued a process to parse the Spreadsheet with JRuby and POI library. I got an error in the sidekiq log from the java side. And then Sidekiq Stops Picking the Jobs from the queue. But the Process is still Alive when I have checked with
ps aux
Here is the versions of Sidekiq and others I am using
sidekiq-4.1.1
JRuby-1.7.19
Is the any way to reset the sidekiq so that it will start processing the jobs automatically.
I just experienced this with sidekiq 4.1.4 on Heroku. It looks like the master process stopped responding but didn't crash. I had to scale down and up again.
Here are the relevant logs:
Nov 14 13:59:37 my-app heroku/worker.1: State changed from up to down
Nov 14 13:59:40 my-app heroku/worker.1: Stopping all processes with SIGTERM
Nov 14 13:59:41 my-app heroku/api: Scaled to console@0:Hobby rake@0:Hobby web@1:Hobby worker@1:Hobby by [email protected]
Nov 14 14:00:09 my-app heroku/worker.1: Error R12 (Exit timeout) -> At least one process failed to exit within 30 seconds of SIGTERM
Nov 14 14:00:09 my-app heroku/worker.1: Stopping remaining processes with SIGKILL
Nov 14 14:00:10 my-app heroku/worker.1: Process exited with status 137
Nov 14 14:00:15 my-app heroku/worker.1: Starting process with command `bundle exec sidekiq -C config/sidekiq.yml`
Nov 14 14:00:16 my-app heroku/worker.1: State changed from starting to up
Then is started taking jobs again.
@jdurand That logging doesn't help. You need to follow the directions noted on the Problems and Troubleshooting wiki page.
In case anyone ends up here with a similar problem, exit status 137 means your process was killed by Linux’s OOM (Out of Memory) killer.
@mperham: sorry for the trouble, but I think documenting this here might help others Googling this, and incidentally reduce the number of tickets being opened related to memory bloat issues.
Most helpful comment
In case anyone ends up here with a similar problem, exit status 137 means your process was killed by Linux’s OOM (Out of Memory) killer.
@mperham: sorry for the trouble, but I think documenting this here might help others Googling this, and incidentally reduce the number of tickets being opened related to memory bloat issues.