Sidekiq: Force retry a Sidekiq job

Created on 14 May 2014  路  17Comments  路  Source: mperham/sidekiq

I have a Sidekiq job that may at times fail to complete successfully because the external API it consumes won't reply in a timely fashion. It's not an exception (per ruby) but a pretty standard mishap in life :).

Thus I want to make Sidekiq put the job into the retry queue _without_ having to throw a "proper" ruby exception (which would have a number of side effects, most prominently it'd start bugging me for a fix via my error handler service).

How should I go about it?

Most helpful comment

@f3ndot yeah, we basically just do this but we only do it for exceptions that we know can happen / don't want to be alerted about. That is, we wouldn't want this retry indefinitely so we only do it for exceptions that shouldn't. For example we do it with Net::OpenTimeout in jobs that upload to S3, or AWS::CloudWatch::Errors::Throttling in monitoring jobs.

def perform(*args)
  begin
    do_work()
  rescue KnownException => e
    Logger.info {"#{self.class} caught #{e.inspect()}, retrying"}
    perform_in(2.minutes, *args)
  end
end

All 17 comments

When we have an error case we don't want to throw an exception, we just
reschedule the job in the future. It doesn't go into the retry set but it
accomplishes the goal.

Sent from my mobile device
On May 14, 2014 11:04 AM, "fastcatch" [email protected] wrote:

I have a Sidekiq job that may at times fail to complete successfully
because the external API it consumes won't reply in a timely fashion. It's
not an exception (per ruby) but a pretty standard mishap in life :).

Thus I want to make Sidekiq put the job into the retry queue _without_having to throw a "proper" ruby exception (which would have a number of
side effects, most prominently it'd start bugging me for a fix via my error
handler service).

How should I go about it?

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1704
.

Thanks, it sure is viable and simple. Still I don't really like it mostly because the API I'm calling may fail for other reasons that I'm not really getting notified of and thus rescheduling would go on forever. Then I need to build much of the plumbing (retry counting and such) that is already built in into Sidekiq. (I may propose factoring the reschedule stuff out into its own method such that it can be independently invoked.)

I'll experiment both with your solution and raising an improper true ruby exception and see which works out better.

I stumbled upon sidekiq-retries (https://github.com/govdelivery/sidekiq-retries) that looks vey much promising. This thread seems to have stalled, so I close it.

I posted a reasonable workaround here: http://stackoverflow.com/questions/19682594/sidekiq-airbrake-only-post-exception-when-retries-extinguished

That said, it would be nice to be about to return RETRY_LATER or something for a cleaner exit.

@jonhyman Just so I understand correctly, when you say "reschedule" do you mean:

# app/controller/some_controller.rb
class SomeController < ApplicationController
  def index
    # ...
    SomeWorker.perform_async(remote_resource: RemoteClient.get(params[:id]))
    # ...
  end
end
# app/workers/some_worker.rb
class SomeWorker
  include Sidekiq::Worker

  def perform(remote_resource:)
    return self.class.perform_in(5.minutes, remote_resource: remote_resource) unless remote_resource.ready?
    # do the work once remote resource is ready
  end
end

I somewhat wish I could get the exponential backoff that retries afford with a Ruby exception, without the actual raising of exceptions that, as @fastcatch said, causes unwanted side-effects.

@f3ndot yeah, we basically just do this but we only do it for exceptions that we know can happen / don't want to be alerted about. That is, we wouldn't want this retry indefinitely so we only do it for exceptions that shouldn't. For example we do it with Net::OpenTimeout in jobs that upload to S3, or AWS::CloudWatch::Errors::Throttling in monitoring jobs.

def perform(*args)
  begin
    do_work()
  rescue KnownException => e
    Logger.info {"#{self.class} caught #{e.inspect()}, retrying"}
    perform_in(2.minutes, *args)
  end
end

This is something that I would love to be able to do. We're reaching out to websites, so we want to use the existing retry functionality which triggers when there's an HTTP exception. But the HTTP exceptions are expected, so we want to be able to trigger a retry without the exception. The exceptions hurt our New Relic error rate and dirty Rollbar.

For the record: if I remember correctly I ended up defining a new exception and monkey-patching Sidekiq to handle this exception differently (i.e catch and force retry but do not re-raise).

It was a long time ago and have moved on since but I can dig you up the code if you need it. (It'd take some time, though. And it's more than likely that Sidekiq has changed enough so that it cannot be used w/o modification.)

I鈥檇 prefer not to manage blacklists for New Relic and Rollbar, especially since it could silence real errors. I think I鈥檓 going to do something like @fastcatch but with Sidekiq middleware. Recast the know exceptions and catch them in the middleware. I鈥檒l let y鈥檃ll know if this works.

@jonhyman thanks for the "reschedule" workaround suggestion.

If I understood well by using perform_in it will schedule a new job, so the retry counter will start from 0 (zero), on the new job.

In my case, I would like to also increment the retry counter when "rescheduling" the job, so the sidekiq_retries_exhausted hook would be called when max retries is reached.

Do you guys have an idea about how to do it, without needing to raise an error?

@lucasdavila
hey did you found a solution to this.

Basically I want to do "If I understood well by using perform_in it will schedule a new job, so the retry counter will start from 0 (zero), on the new job."
and also to kill the current job so as it doesn't throw any exceptions.

Same, my case is I want to catch a known Exception and then throw it back to the Retry queue (my function relies on a 3rd party service that has intermittent problems) without it showing up in Sentry. So it's the case of yes, it's an error and I want it retried but no, I don't want it going to Sentry.

As is, I can't catch the error because then it won't go to the Retry queue so my Sentry is littered with these known 500s and I have to clean it every week or so.

@rbucks configure Raven to not send the exception.

I just run into this problem also and temporary fixed it using a counter on the worker that randomly fails on a specific api call to prevent infinite loop:

perform(*args, counter = 0)
  ... 
  perform_in(10.seconds, *args, counter + 1) if error && counter < 3
end

Hi @user7788, while trying your approach, I am getting this error.

NoMethodError: undefined method `perform_in' for Worker

Hi @user7788, while trying your approach, I am getting this error.

NoMethodError: undefined method `perform_in' for Worker

self.class.perform_in(10.seconds)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

paul-ylz picture paul-ylz  路  4Comments

nikhilm492 picture nikhilm492  路  4Comments

agrobbin picture agrobbin  路  4Comments

rajcybage picture rajcybage  路  3Comments

davidcelis picture davidcelis  路  3Comments