Rq: API for job retries

Created on 13 Apr 2013 · 20Comments · Source: rq/rq

We'd likely need to merge scheduler-integration into master before we can implement job retries because we'll need a scheduler component to deal with delays.

Here are a few API ideas:

Alternative 1

q.enqueue(
    func,
    max_retries=3, # Retry up to 3 times
    retry_delay=60, # In seconds 
)

# job decorator
@job(q, max_retries=3, retry_delay=60)
def func():
    pass

Alternative 2

retry = (3, timedelta(minutes=1)) # Retry 3 times with 1 minute delay
q.enqueue(
    func,
    retry=retry, 
)

# Here's how it looks like if we pass in retry directly to enqueue
q.enqueue(
    func,
    retry=(3, timedelta(minutes=1)), 
)

# We can also pass in integer as the second retry argument
q.enqueue(
    func,
    retry=(3, 60), 
)

@job(q, retry=(3, timedelta(minutes=1))
def func():
    pass

Thoughts?

Source

selwin

👍5

Most helpful comment

Hi, I'm chiming in here with a heavy heart. I've spent the past few days pulling together task processing for our system over at CourtListener.com. For the past five years or so, we've used Celery, but I've never been particularly enamored of it — it's big, complex, hard to debug, and kind of a pain.

But — But it has retries with exponential backoff. Baked in. With ease.

It took me a minute to figure out that rq lacked that feature, but once I did it was my hope that with some effort I could pull together rq-scheduler, rq, rq-retry, and some bailing wire into a nice system for doing our task processing. I set up exception_handlers, experimented with rewriting some of our more complex tasks so they'd work in the rq way, and, well, I've concluded any solution I develop like this is kind of terrible:

In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.
The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.
Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?
Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)
I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

I hope this helps, and I'll certainly keep an eye on rq. I like it in general, but without retry support built in, I'm sure it's not something we can use for our purposes. Sorry it has to end this way. :)

mlissner on 10 Apr 2019

👍7 ❤2

All 20 comments

I like alternative 1 most.

nvie on 19 Apr 2013

+1 for the alternative 1.

ecarreras on 25 May 2013

+1 for 1

sylvinus on 13 Jun 2013

If I was going to try implementing retries, would it make more sense to start from the scheduler-integration branch, or to add it to rq-scheduler? Is the scheduler-integration branch likely to be merged into master any time soon?

erussell on 19 Mar 2014

@erussell realistically, I don't think the scheduler-integration branch would be merged into master anytime soon.

If you'd like to add it on rq-scheduler, I'd be happy to review your PRs.

selwin on 22 Mar 2014

@erussell sorry, but I didn't think thoroughly before answering your question. I don't think implementing job retries can be cleanly done in rq-scheduler.

selwin on 22 Mar 2014

Sounds like it will have to wait until scheduler-integration is merged into master. How stable is that branch? Would it make sense to start work from there, or should I just wait?

erussell on 24 Mar 2014

That branch is stable in the sense that all tests passed when I last worked on it. The scheduler itself is very well tested so I'm confident that it works properly.

I'd appreciate it if you could merge the current master into the scheduler branch and start working on it :).

Does that sound good @nvie ?

selwin on 26 Mar 2014

+1 would really like this feature, would also like to one run deamon not both the scheduler and the job processor

jtushman on 30 Jul 2014

Guys, we need it sooo bad :cry:

iorlas on 22 Apr 2015

For my project I implemented retries with a customer worker class. It hooks clean_registries() and applies retry logic to jobs in the failed queue. I looked into doing this in RQ core and submitting a PR, but felt that these might/would need to be addressed first:

and a standalone worker was pretty straightforward.

I'd be interested in incorporating this into the core if there is interest.

mgk on 4 Aug 2015

The latest stable version of RQ automatically calls clean_registries() periodically so you don't need to manually call it.

Would welcome contribution to both implementing FailedJobRegistry and integrating scheduler :)

selwin on 4 Aug 2015

The latest stable version of RQ automatically calls clean_registries() periodically so you don't need to manually call it.

I said hook when I should have said overrode. In my Worker subclass I overrode clean_registries() to do extra maintenance. It might make more sense to have this:

 def run_maintenance_tasks(self):
    self.clean_registries()

for subclass safety, and it's consistent with should_run_maintenance_tasks. Can make PR for that if you think it's worth it.

Would welcome contribution to both implementing FailedJobRegistry and integrating scheduler :)

:) I could see doing the former at some point, but not likely the latter. I have some thoughts and questions on registries before getting started, maybe I'll open separate issue for that discussion.

mgk on 4 Aug 2015

is this feature available?

rizplate on 15 May 2017

But — But it has retries with exponential backoff. Baked in. With ease.

In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.
The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.
Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?
Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)
I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

mlissner on 10 Apr 2019

👍7 ❤2

Hi Mike,

This is a great feedback. Thanks!

Yes, RQ’ documentation sucks, but I plan on improving it as I continue developing RQ’s feature sets.
I’m working on bringing RQ scheduler into RQ proper so we can have basic scheduling built in.
After basic scheduling is done and implemented, retries should be next.

I can’t promise any timeline though as I only get to work on RQ during my spare time :)
On 11 Apr 2019 04.56 +0700, Mike Lissner notifications@github.com, wrote:

Hi, I'm chiming in here with a heavy heart. I've spent the past few days pulling together task processing for our system over at CourtListener.com. For the past five years or so, we've used Celery, but I've never been particularly enamored of it — it's big, complex, hard to debug, and kind of a pain.
But — But it has retries with exponential backoff. Baked in. With ease.
It took me a minute to figure out that rq lacked that feature, but once I did it was my hope that with some effort I could pull together rq-scheduler, rq, rq-retry, and some bailing wire into a nice system for doing our task processing. I set up exception_handlers, experimented with rewriting some of our more complex tasks so they'd work in the rq way, and, well, I've concluded any solution I develop like this is kind of terrible:

• > In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.
• > The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.
• > Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?
• > Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

> I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)

> I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

I hope this helps, and I'll certainly keep an eye on rq. I like it in general, but without retry support built in, I'm sure it's not something we can use for our purposes. Sorry it has to end this way. :)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

selwin on 11 Apr 2019

I can’t promise any timeline though as I only get to work on RQ during my spare time :)

I totally understand. rq is already amazing. This is a really tough problem space. Glad the notes help.

mlissner on 11 Apr 2019

any updates on this retry w/exp. backoff feature request?

taewookim on 4 May 2020

Now that job scheduling is live, I plan on tackling job retries next. No
promises on when this would be live though as this is my free time project.

On Tue, May 5, 2020 at 3:59 AM TaeWoo Kim notifications@github.com wrote:

any updates on this retry w/exp. backoff feature request?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rq/rq/issues/201#issuecomment-623702643, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABOB4S675DNLBNVEZP2OR3RP4UDXANCNFSM4AETMOXQ
.