Bull: Worker/Processor does not "wake up" sometimes.

Created on 15 Aug 2019 · 10Comments · Source: OptimalBits/bull

Hello,

I've been using this package and it's great, only thing holding us back from going into production is that every few days it seems the queue gets "stuck" as in the worker does not wake up and start processing jobs.
This is sometimes fixed by restarting the service and sometimes just resolves on its own after a while.
I wanted to ask a few questions that might help me find why this is happening.

Is there a known issue of the worker not waking up? is there any known solution to it?

Someone suggested that since the service is rarely used right now (goes days without use) the reason might be a stale connection, could this be the reason?

I ran into the same issue when trying to work with redis-sentinel in this package - the worker would not wake up and process at all.

unfortunately it would be very difficult for me to add a test code the reproduce since its very modular in how it's spread in the project, and even then it seems to "randomly" happen every few days but sometimes even weeks without problem, so I don't think you'll be able to reproduce it, but if I can get answers to these questions I might be able to better find a solution to this.

any help appreciated.

Thank you.

Source

adame21

Most helpful comment

OK!
So I have some news and I feel a little dumb now.
I won't go into the details too much but it was mostly a problem with our QA tester, he basically sent a lot of processing heavy requests and while the worker was "stuck" on them it was actually working, what I saw was the job entering the queue and never getting worked on, the service is supposed to eventually send an email so it seemed like something was wrong because no email would get sent but then I figured out he was sending defective requests and not checking in the correct place to see that he got an error, so he just complained the service is stuck.
I'm very sorry to have wasted your time on such mundane issues, and I greatly appreciate your help.
This is an amazing queue package and the support is top notch. thanks a lot @stansv and @manast !

adame21 on 18 Aug 2019

👍2

All 10 comments

There is no know issue regarding workers that do not process jobs, and the library is used in production in many products processing millions of jobs daily.
Your case seems to be releated to some reconnection issue, but then you should get some error, do you have listeners for the error event set up so that you can catch potencial redis errors?

manast on 15 Aug 2019

Hi, first thanks for the fast reply!

Yes I do have error handling throughout my code.
No error pops up, the loggers show the job entering the queue successfully but then just nothing happens.
I'm writing a piece of code that would run the entire weekend to see if this happens in an isolated application.
It will simply run a job every hour and log it.
I guess it must be something on my side.
If you have any other leads please tell me.
Thank you.

adame21 on 15 Aug 2019

the way the workers listen for new jobs is that they use a blocking command (BRPOPLPUSH) that waits until a new job arrives to the queue. This command timesout after some predefined time (I think it is 5 seconds by default), then tries again.
One thing you can try is to use redis-cli monitor it shows all the commands executed in the redis server, so you should be able to see the BRPOPLPUSH command executing from time to time.

manast on 15 Aug 2019

Sounds really good regarding redis-cli monitor. will try this.
Should I close for now ? or keep open until I have an update?
Thanks again, I really appreciate you and your package.

adame21 on 15 Aug 2019

you can keep it open until we sort it out.

manast on 15 Aug 2019

Hi,
I have somewhat of an update.
I set up a script that sends a job every hour and simply logs it in console and a .txt file.
while that was running I set up the redis-cli monitor.
I dont know what happened exactly but after 1 hour it did everything as expected but then this showed up in my visual studio console and im not sure why.

from the terminal (full terminal i know it repeated 4 times):

job being processed with ID: 1
BRPOPLPUSH MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
at Socket. (C:\Users\adame\Desktop\bull queue test\node_modules\ioredis\built\redis\event_handler.js:108:37)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCP._handle.close (net.js:606:12)
BRPOPLPUSH MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
at Socket. (C:\Users\adame\Desktop\bull queue test\node_modules\ioredis\built\redis\event_handler.js:108:37)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCP._handle.close (net.js:606:12)
BRPOPLPUSH MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
at Socket. (C:\Users\adame\Desktop\bull queue test\node_modules\ioredis\built\redis\event_handler.js:108:37)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCP._handle.close (net.js:606:12)
BRPOPLPUSH MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
at Socket. (C:\Users\adame\Desktop\bull queue test\node_modules\ioredis\built\redis\event_handler.js:108:37)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCP._handle.close (net.js:606:12)

does this have anything to do with the queue itself or maybe the redis-cli monitor?
I couldn't get much info from the redis-cli monitor since a lot of departments in my organization use this redis server and its basically flooded with information matrix-style.

adame21 on 15 Aug 2019

Could you turn debug ioredis mode on DEBUG=ioredis:* node yourapp.js and post the logs here? (this command executes node with environment variable DEBUG set to ioredis:*)

stansv on 15 Aug 2019

@adame21 seems like it disconnected from redis, then ioredis is retrying and gives up after 20 retries. There is this setting, maxRetriesPerRequest, in ioredis you could tweak: https://github.com/luin/ioredis/blob/master/API.md#new_Redis
From ioredis documentation:
By default, all pending commands will be flushed with an error every 20 retry attempts. That makes sure commands won't wait forever when the connection is down. You can change this behavior by setting maxRetriesPerRequest:

var redis = new Redis({
  maxRetriesPerRequest: 1
});

Set maxRetriesPerRequest to null to disable this behavior, and every command will wait forever until the connection is alive again (which is the default behavior before ioredis v4).

manast on 15 Aug 2019

Hi @stansv
I will do this on sunday when im back in the office, thanks for the suggestion
@manast
It seems i made a mistake as i wasnt connected to the correct redis server when running this test.
Ive set it up to run on my integration environment over the weekend and will report back with my findings on sunday
thanks

adame21 on 15 Aug 2019

👍1