Resque: QUEUE=* bundle exec rake resque:work will fail and not output any errors when redis has bad keys

Created on 16 May 2013 · 17Comments · Source: resque/resque

This is for 1-x-stable:

Running QUEUE=* bundle exec rake resque:work fail without any visible errors if redis has bad keys. I believe this maybe be a bug in th prune_dead_workers method.

Bug Hard

Source

dangalipo

Most helpful comment

@faliev calling FLUSHALL in redis-cli will fix this. From memory it occurs when the worker definition changes but the re are still jobs in the queue (I think). Hope that helps.

dangalipo on 11 Jun 2013

👍5

All 17 comments

Bummer. :(

steveklabnik on 16 May 2013

I believe I just encountered this bug as well. I'm running on my production worker box:

QUEUE=batch_action bundle exec rake resque:work --trace

** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work

.. and it exits right after last message. On my dev box its running fine. Same gem versions, resque (1.24.1).

Can you please let us know if you were able to solve this. Which keys are bad etc.

Thank you!

faliev on 10 Jun 2013

@faliev calling FLUSHALL in redis-cli will fix this. From memory it occurs when the worker definition changes but the re are still jobs in the queue (I think). Hope that helps.

dangalipo on 11 Jun 2013

👍5

RE: @dangalipo's FLUSHALL suggestion:

Very important to note the calling FLUSHALL in redis-cli will flush _all_ namespaces in _all_ databases and is generally like dropping a nuclear weapon on the problem. If you're using redis for _anything_ else or have items in queue that you don't want to lose, this is _not_ the option for you.

yaauie on 15 Jun 2013

👍1

FLUSHALL worked for me. This isn't the greatest solution but for someone who isn't using redis for anything other than resque it's OK. Thanks @dangalipo.

uri on 24 Jun 2013

FLUSHALL worked for me too. Thanks @dangalipo

alagu on 21 Jul 2013

FLUSHALL also worked for me.

jasontruluck on 23 Jul 2013

"Bad keys" is a pretty nondescript error; what is wrong, and how can we do better at cleaning it up? Could the next person to come across this share a dump of their redis DB either here or privately?

yaauie on 24 Jul 2013

You can recreate the problem by:

stopping resque
using console (or UI) add 2 or more jobs from workerX to the queue
change workerX class name to workerY
start resque worker
first time it crash with error
start it again and from now on it wont start with no error

Aviram

aviramradai on 24 Jul 2013

Did some more digging,

For the above scenario it looks like resque think there are workers running.
To solve the issue without FLISHALL, clear the original queue (workerX) and run the following command from rails console:

Resque.workers.each {|w| w.unregister_worker}

Aviram

aviramradai on 24 Jul 2013

I've been experiencing this bug and it's pretty frustrating:

bash-4.1# export VVERBOSE=true
bash-4.1# bundle exec rake resque:work --trace
** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup 
** Execute resque:work
** [19:52:08 2014-05-29] 73: Starting worker xxx.xxx.xxx.net:73:*
** [19:52:08 2014-05-29] 73: Registered signals
** [19:52:08 2014-05-29] 73: Running before_first_fork hooks

And then it just dies. I've tried downgrading by several versions, but no dice. I've tried calling FLUSHALL but that also isn't working. I can't for the life of me get Resque running.

davidcelis on 30 May 2014

Downgrading redid to 2.2.2 worked for me

On Friday, May 30, 2014, David Celis [email protected] wrote:

I've been experiencing this bug and it's pretty frustrating:

bash-4.1# export VVERBOSE=true
bash-4.1# bundle exec rake resque:work --trace
* Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work
** [19:52:08 2014-05-29] 73: Starting worker xxx.xxx.xxx.net:73:
** [19:52:08 2014-05-29] 73: Registered signals
** [19:52:08 2014-05-29] 73: Running before_first_fork hooks

And then it just dies. I've tried downgrading by several versions, but no
dice. I've tried calling FLUSHALL but that also isn't working. I can't
for the life of me get Resque running.

—
Reply to this email directly or view it on GitHub
https://github.com/resque/resque/issues/1013#issuecomment-44600319.

Alagu

alagu on 30 May 2014

@davidcelis can you provide any more context? What version of ruby, redis-server, redis-rb (adapter gem), and resque are you using? Do you have any before_first_fork hooks registered? Can you run the monitor command on a redis-cli to see if the worker is getting registered? Do you have any queues that have jobs in them?

These are the kinds of information that help other people successfully donate time to solving your problem; without it, there's not much that can be done efficiently.

yaauie on 30 May 2014

@alagu this would suggest that a redis call you're making is erroring out. Since redis 3.x disabled the sharing of a socket between the parent & child, maybe this could be the cause of your error. In that gem (which I also maintain), we recently added automatic reconnects in a fork child; can you try pinning to redis/redis-rb@831cccfb924be8f5c87e78593857b47853cdadda and letting me know if this fixes your problem?

yaauie on 30 May 2014

@yaauie Sure, here's some more context:

We're on Ruby 2.1.2 with the latest stable version of Resque. Redis itself is pegged at 2.8.6 and redis-rb is at the latest version as well. We have no before_first_fork hooks. The one perhaps odd part of this is that we're running Resque in a Docker container (but Redis is external). I think we did figure out a potential issue: when we added statements in our Dockerfile to create a user to own the app and run Resque, it finally stayed up and started listening on its assigned queues. Perhaps Resque has an issue running under root?

davidcelis on 30 May 2014

same here :(

ruby 2.1.2
resque 1.24.1
redis 2.8.14
OSX 10.9.4

QUEUE=normal bundle exec rake resque:work --trace
** Invoke resque:work (first_time)
** Invoke resque:preload (first_time)
** Invoke resque:setup (first_time)
** Invoke environment (first_time)
** Execute environment
Connecting to database specified by database.yml
** Execute resque:setup
** Execute resque:preload
** Invoke resque:setup
** Execute resque:work
ƒ echo $?
0

timruffles on 23 Sep 2014

Still happening. Jsut sumbled upon it. Renaming the worker class, and restarting esque while old class is still in queue, will cause workers to die silently.