Resque: QUEUE=* bundle exec rake resque:work will fail and not output any errors when redis has bad keys

Created on 16 May 2013  Â·  17Comments  Â·  Source: resque/resque

This is for 1-x-stable:

Running QUEUE=* bundle exec rake resque:work fail without any visible errors if redis has bad keys. I believe this maybe be a bug in th prune_dead_workers method.

Bug Hard

Most helpful comment

@faliev calling FLUSHALL in redis-cli will fix this. From memory it occurs when the worker definition changes but the re are still jobs in the queue (I think). Hope that helps.

All 17 comments

Bummer. :(

I believe I just encountered this bug as well. I'm running on my production worker box:

QUEUE=batch_action bundle exec rake resque:work --trace

** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work

.. and it exits right after last message. On my dev box its running fine. Same gem versions, resque (1.24.1).

Can you please let us know if you were able to solve this. Which keys are bad etc.

Thank you!

@faliev calling FLUSHALL in redis-cli will fix this. From memory it occurs when the worker definition changes but the re are still jobs in the queue (I think). Hope that helps.

RE: @dangalipo's FLUSHALL suggestion:

Very important to note the calling FLUSHALL in redis-cli will flush _all_ namespaces in _all_ databases and is generally like dropping a nuclear weapon on the problem. If you're using redis for _anything_ else or have items in queue that you don't want to lose, this is _not_ the option for you.

FLUSHALL worked for me. This isn't the greatest solution but for someone who isn't using redis for anything other than resque it's OK. Thanks @dangalipo.

FLUSHALL worked for me too. Thanks @dangalipo

FLUSHALL also worked for me.

"Bad keys" is a pretty nondescript error; what is wrong, and how can we do better at cleaning it up? Could the next person to come across this share a dump of their redis DB either here or privately?

You can recreate the problem by:

  1. stopping resque
  2. using console (or UI) add 2 or more jobs from workerX to the queue
  3. change workerX class name to workerY
  4. start resque worker
  5. first time it crash with error
  6. start it again and from now on it wont start with no error

Aviram

Did some more digging,

For the above scenario it looks like resque think there are workers running.
To solve the issue without FLISHALL, clear the original queue (workerX) and run the following command from rails console:

Resque.workers.each {|w| w.unregister_worker}

Aviram

I've been experiencing this bug and it's pretty frustrating:

bash-4.1# export VVERBOSE=true
bash-4.1# bundle exec rake resque:work --trace
** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup 
** Execute resque:work
** [19:52:08 2014-05-29] 73: Starting worker xxx.xxx.xxx.net:73:*
** [19:52:08 2014-05-29] 73: Registered signals
** [19:52:08 2014-05-29] 73: Running before_first_fork hooks

And then it just dies. I've tried downgrading by several versions, but no dice. I've tried calling FLUSHALL but that also isn't working. I can't for the life of me get Resque running.

Downgrading redid to 2.2.2 worked for me

On Friday, May 30, 2014, David Celis [email protected] wrote:

I've been experiencing this bug and it's pretty frustrating:

bash-4.1# export VVERBOSE=true
bash-4.1# bundle exec rake resque:work --trace
* Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work
** [19:52:08 2014-05-29] 73: Starting worker xxx.xxx.xxx.net:73:

** [19:52:08 2014-05-29] 73: Registered signals
** [19:52:08 2014-05-29] 73: Running before_first_fork hooks

And then it just dies. I've tried downgrading by several versions, but no
dice. I've tried calling FLUSHALL but that also isn't working. I can't
for the life of me get Resque running.

—
Reply to this email directly or view it on GitHub
https://github.com/resque/resque/issues/1013#issuecomment-44600319.

Alagu

@davidcelis can you provide any more context? What version of ruby, redis-server, redis-rb (adapter gem), and resque are you using? Do you have any before_first_fork hooks registered? Can you run the monitor command on a redis-cli to see if the worker is getting registered? Do you have any queues that have jobs in them?

These are the kinds of information that help other people successfully donate time to solving your problem; without it, there's not much that can be done efficiently.

@alagu this would suggest that a redis call you're making is erroring out. Since redis 3.x disabled the sharing of a socket between the parent & child, maybe this could be the cause of your error. In that gem (which I also maintain), we recently added automatic reconnects in a fork child; can you try pinning to redis/redis-rb@831cccfb924be8f5c87e78593857b47853cdadda and letting me know if this fixes your problem?

@yaauie Sure, here's some more context:

We're on Ruby 2.1.2 with the latest stable version of Resque. Redis itself is pegged at 2.8.6 and redis-rb is at the latest version as well. We have no before_first_fork hooks. The one perhaps odd part of this is that we're running Resque in a Docker container (but Redis is external). I think we did figure out a potential issue: when we added statements in our Dockerfile to create a user to own the app and run Resque, it finally stayed up and started listening on its assigned queues. Perhaps Resque has an issue running under root?

same here :(

  • ruby 2.1.2
  • resque 1.24.1
  • redis 2.8.14
  • OSX 10.9.4
QUEUE=normal bundle exec rake resque:work --trace
** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
Connecting to database specified by database.yml
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work
Æ’ echo $?
0

Still happening. Jsut sumbled upon it. Renaming the worker class, and restarting esque while old class is still in queue, will cause workers to die silently.

Was this page helpful?
0 / 5 - 0 ratings