Rq: API for job retries

Created on 13 Apr 2013  Â·  20Comments  Â·  Source: rq/rq

We'd likely need to merge scheduler-integration into master before we can implement job retries because we'll need a scheduler component to deal with delays.

Here are a few API ideas:

Alternative 1

q.enqueue(
    func,
    max_retries=3, # Retry up to 3 times
    retry_delay=60, # In seconds 
)

# job decorator
@job(q, max_retries=3, retry_delay=60)
def func():
    pass

Alternative 2

retry = (3, timedelta(minutes=1)) # Retry 3 times with 1 minute delay
q.enqueue(
    func,
    retry=retry, 
)

# Here's how it looks like if we pass in retry directly to enqueue
q.enqueue(
    func,
    retry=(3, timedelta(minutes=1)), 
)

# We can also pass in integer as the second retry argument
q.enqueue(
    func,
    retry=(3, 60), 
)

@job(q, retry=(3, timedelta(minutes=1))
def func():
    pass

Thoughts?

Most helpful comment

Hi, I'm chiming in here with a heavy heart. I've spent the past few days pulling together task processing for our system over at CourtListener.com. For the past five years or so, we've used Celery, but I've never been particularly enamored of it — it's big, complex, hard to debug, and kind of a pain.

But — But it has retries with exponential backoff. Baked in. With ease.

It took me a minute to figure out that rq lacked that feature, but once I did it was my hope that with some effort I could pull together rq-scheduler, rq, rq-retry, and some bailing wire into a nice system for doing our task processing. I set up exception_handlers, experimented with rewriting some of our more complex tasks so they'd work in the rq way, and, well, I've concluded any solution I develop like this is kind of terrible:

  • In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.

  • The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.

  • Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?

  • Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

  1. I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)

  2. I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

I hope this helps, and I'll certainly keep an eye on rq. I like it in general, but without retry support built in, I'm sure it's not something we can use for our purposes. Sorry it has to end this way. :)

All 20 comments

I like alternative 1 most.

+1 for the alternative 1.

+1 for 1

If I was going to try implementing retries, would it make more sense to start from the scheduler-integration branch, or to add it to rq-scheduler? Is the scheduler-integration branch likely to be merged into master any time soon?

@erussell realistically, I don't think the scheduler-integration branch would be merged into master anytime soon.

If you'd like to add it on rq-scheduler, I'd be happy to review your PRs.

@erussell sorry, but I didn't think thoroughly before answering your question. I don't think implementing job retries can be cleanly done in rq-scheduler.

Sounds like it will have to wait until scheduler-integration is merged into master. How stable is that branch? Would it make sense to start work from there, or should I just wait?

That branch is stable in the sense that all tests passed when I last worked on it. The scheduler itself is very well tested so I'm confident that it works properly.

I'd appreciate it if you could merge the current master into the scheduler branch and start working on it :).

Does that sound good @nvie ?

+1 would really like this feature, would also like to one run deamon not both the scheduler and the job processor

Guys, we need it sooo bad :cry:

For my project I implemented retries with a customer worker class. It hooks clean_registries() and applies retry logic to jobs in the failed queue. I looked into doing this in RQ core and submitting a PR, but felt that these might/would need to be addressed first:

and a standalone worker was pretty straightforward.

I'd be interested in incorporating this into the core if there is interest.

The latest stable version of RQ automatically calls clean_registries() periodically so you don't need to manually call it.

Would welcome contribution to both implementing FailedJobRegistry and integrating scheduler :)

The latest stable version of RQ automatically calls clean_registries() periodically so you don't need to manually call it.

I said hook when I should have said overrode. In my Worker subclass I overrode clean_registries() to do extra maintenance. It might make more sense to have this:

 def run_maintenance_tasks(self):
    self.clean_registries()

for subclass safety, and it's consistent with should_run_maintenance_tasks. Can make PR for that if you think it's worth it.

Would welcome contribution to both implementing FailedJobRegistry and integrating scheduler :)

:) I could see doing the former at some point, but not likely the latter. I have some thoughts and questions on registries before getting started, maybe I'll open separate issue for that discussion.

is this feature available?

Hi, I'm chiming in here with a heavy heart. I've spent the past few days pulling together task processing for our system over at CourtListener.com. For the past five years or so, we've used Celery, but I've never been particularly enamored of it — it's big, complex, hard to debug, and kind of a pain.

But — But it has retries with exponential backoff. Baked in. With ease.

It took me a minute to figure out that rq lacked that feature, but once I did it was my hope that with some effort I could pull together rq-scheduler, rq, rq-retry, and some bailing wire into a nice system for doing our task processing. I set up exception_handlers, experimented with rewriting some of our more complex tasks so they'd work in the rq way, and, well, I've concluded any solution I develop like this is kind of terrible:

  • In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.

  • The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.

  • Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?

  • Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

  1. I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)

  2. I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

I hope this helps, and I'll certainly keep an eye on rq. I like it in general, but without retry support built in, I'm sure it's not something we can use for our purposes. Sorry it has to end this way. :)

Hi Mike,

This is a great feedback. Thanks!

  1. Yes, RQ’ documentation sucks, but I plan on improving it as I continue developing RQ’s feature sets.
  2. I’m working on bringing RQ scheduler into RQ proper so we can have basic scheduling built in.
  3. After basic scheduling is done and implemented, retries should be next.

I can’t promise any timeline though as I only get to work on RQ during my spare time :)
On 11 Apr 2019 04.56 +0700, Mike Lissner notifications@github.com, wrote:

Hi, I'm chiming in here with a heavy heart. I've spent the past few days pulling together task processing for our system over at CourtListener.com. For the past five years or so, we've used Celery, but I've never been particularly enamored of it — it's big, complex, hard to debug, and kind of a pain.
But — But it has retries with exponential backoff. Baked in. With ease.
It took me a minute to figure out that rq lacked that feature, but once I did it was my hope that with some effort I could pull together rq-scheduler, rq, rq-retry, and some bailing wire into a nice system for doing our task processing. I set up exception_handlers, experimented with rewriting some of our more complex tasks so they'd work in the rq way, and, well, I've concluded any solution I develop like this is kind of terrible:

• > In theory rq-scheduler could be used to schedule retries, but is it even supported? It does some weird things with extra queues to get things scheduled. It's not exactly elegant. Does it work with 1.0? Seems like it hasn't been updated in a while.
• > The job.requeue method isn't documented (see: #1067). This doesn't inspire faith.
• > Doing retries involves using the exception handlers, which is fine. So maybe I can trigger a retry using custom exceptions. I set up a custom exception, RetryException that I'd throw for tasks that needed to be requeued. That seemed to kind of work, but, well, it still printed the exception to the logs. Did it save the exception to Redis too? I didn't get so far as figuring that out. Maybe? In the meantime, how do I set a timer for it? How do we do an exponential backoff on that timer?
• > Let's say I've got a chain of tasks I want to run and the first one triggers a retry, then fails. Do I need to handle that failure? What's up with the deferred tasks while the retry is happening? What about after it fails? I don't even know, but I do know I'm in a messy place of cleaning up tasks, dealing with retries, etc. This is what an API is supposed to save me from implementing myself, right? Will I wind up with tasks in queues I don't know exist and therefore slowly fill up redis? I give up.

I'm putting all of the above in here for two reasons:

  1. > I think rq should really document the absence of built in retries or at least document the heck out of a way to do them yourself, including any limitations that come along with that. (See above for concerns I'd want addressed in any documentation of a home built solution.)
  2. > I'm hopeful that my experience will help rq prioritize this. I feel like this is an essential feature for a task queue. Maybe I'm wrong, judging by the popularity of rq.

I hope this helps, and I'll certainly keep an eye on rq. I like it in general, but without retry support built in, I'm sure it's not something we can use for our purposes. Sorry it has to end this way. :)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

I can’t promise any timeline though as I only get to work on RQ during my spare time :)

I totally understand. rq is already amazing. This is a really tough problem space. Glad the notes help.

any updates on this retry w/exp. backoff feature request?

Now that job scheduling is live, I plan on tackling job retries next. No
promises on when this would be live though as this is my free time project.

On Tue, May 5, 2020 at 3:59 AM TaeWoo Kim notifications@github.com wrote:

any updates on this retry w/exp. backoff feature request?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rq/rq/issues/201#issuecomment-623702643, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABOB4S675DNLBNVEZP2OR3RP4UDXANCNFSM4AETMOXQ
.

Good luck @selwin! Just know that there's at least one team waiting on a retry functionality built into RQ :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samuelcolvin picture samuelcolvin  Â·  9Comments

DavidHwu picture DavidHwu  Â·  9Comments

fossilet picture fossilet  Â·  50Comments

canni picture canni  Â·  32Comments

mark-99 picture mark-99  Â·  29Comments