Hello Mike,
What do you think of having a declarative class method for saying "do not retry if the error class is this and that"? A typical use case is something like this:
class ThumbnailWorker
include Sidekiq::Worker
sidekiq_options retry: 5
# do_not_retry_if Sequel::NoMatchingRow,
# RemoteFetcher::ResourceNotFoundError
def perform(image_id)
# Find the DB record with URLs.
image = Image.with_pk!(image_id)
# Download the image, resize it, do whatever with it.
Thumbnailer.new(image).generate
rescue Sequel::NoMatchingRow
# There is no point in retrying since the record does not exist.
rescue RemoteFetcher::ResourceNotFoundError
# Amazon S3 said 404, no point in retrying.
# However, we DO want to try again if it was a timeout.
end
end
The point is to not retry if it doesn't make sense.
Thanks,
Ollie
As you show, your worker should explicit catch and ignore any errors. I'm not taking feature requests for the retry logic anymore.
So much for politeness. :-1:
How is my reply impolite?
@ollie check out https://github.com/govdelivery/sidekiq-retries
That response and closing the issue actually feels like shoving something off the table. That feels arrogant.
Thanks @substars, I'll take a look.
How about the ActiveJob::DeserializationError with Rails 4.2. Does anyone know how we can set it so it is not retried? It never seems to enter the perform method so I am not sure how to deal with this.
@pomartel What is the actual problem? Why is that error raised? Sidekiq does not have any knowledge about Rails and I really don't want to add Rails-specific code paths if possible.
I could see how this could be helpful depending on the application. There may be some application logic that would want to suppress retries based upon one issue or another. For example a worker that is accessing an API may be throttled that receives a throughput exceeded exception, or a record not found exception.
The code may look like:
def initialize(options = {})
@max_retries = options.fetch(:max_retries, DEFAULT_MAX_RETRY_ATTEMPTS)
@ignored_exceptions = options.fetch(:ignored_exceptions, [])
end
def call(worker, msg, queue)
yield
rescue Sidekiq::Shutdown
# ignore, will be pushed back onto queue during hard_shutdown
raise
rescue Exception => e
# ignore, will be pushed back onto queue during hard_shutdown
raise Sidekiq::Shutdown if exception_caused_by_shutdown?(e)
raise e unless msg['retry']
raise e if @ignored_exceptions.include?(e.class)
attempt_retry(worker, msg, queue, e)
end
@mperham The Deserialization error happens because of a record not found exception. This typically happens when I queue a job to happen at a later time and meanwhile the object on which the job was set to run is deleted. An example scenario :
1- A new user registers to the application and a welcome email is set be delivered 1 day from now.
2- The user deletes his account.
3- A deserialization error is raised when the job runs because the record cannot be found (deleted).
I just saw that Active Job provides a rescue_from method. I will try to see if I can catch the exception with that method : http://edgeguides.rubyonrails.org/active_job_basics.html#exceptions
On retries, I don't think I would want such a failure to prevent retries. I would want it to continue to error out to force manual intervention. If someone is deleting records that were expected to be used by a job, it's possible that it was still required to process the job successfully. In other cases, it may be that the job can be safely deleted. There should definitely be an easy way to control that in the application though.
@pomartel That is exactly why I told the Rails team that GlobalID is a bad idea and why Sidekiq doesn't do it already. Looking up records means dealing with exceptions and various edge cases (e.g. transaction boundaries) that the application really should be responsible for. I was overruled in the matter.
Just so you know, this works :
rescue_from(ActiveJob::DeserializationError) do |exception|
# The user record has been deleted. Ignore this.
end
It's not ideal since I might be catching deserialization errors that are not related to records not found but it will work for my use case. Thanks for your help!
I didn't really see anyone explaining the way I see this, so I'm trying now: Some exceptions just don't make sense to be retried at all - a cliche example is the ActiveRecord::RecordNotFound. Applications that interact with third-party services on behalf of users might face exceptions like SomeService::ProfileNotFound that don't make sense to be retried as well.
So, for cases like this, I feel error-prone to reimplement the same rescue block in multiple workers.
Also, in many cases that I could simply do SomeClass.delay.method and I want to skip a common exception from retry, I'll have to wrap in a custom worker class.
This behavior is used in gems such as Airbrake and Rollbar:

_I don't think we should/need to have a default list of skipped exceptions like Airbrake. I like that Sidekiq won't be having a lot of Rails-specific code_
@mperham It would be a pleasure to write a PR if you think you can accept that. Thanks.
I'm still struggling to see the use case.
I tend to actually care about ActiveRecord::RecordNotFound. I almost always want an issue with a missing record to be anything but skipped over. Instead I want it in a list of retry jobs, so I can identify the problem, because it could be a sign of an underlying bug. For the cases where I expect something to potentially not exist, using methods that return nil instead of raising lets you return early.
For something like an intermittent third party API that claims there's a missing record, I want the retry because I've seen it correct itself. But again, it could be a legitimate issue with an incorrect unique ID or something. I wouldn't want to paint over problems by skipping retries.
Is this something that a custom exception handler can implement along with a declarative interface at the job level?
@aprescott Thanks for your attention here.
Adding a global non-retry list with the scenarios I described doesn't really mean we would silently fail. I still implement the Sidekiq global exception handler to notify me on Rollbar properly when I know it's something I should be aware of.
My point is: I'm currently about to refactor fair amount of code to stop using Model.delay.method because when we have a RecordNotFound in this, it's really safe that I shouldn't retry. E.g:

As you can see in this image, the id was persisted because it had an id, but it couldn't be found - there's really no reason to retry this. It's quite awkward it happened though, so the simplest way I see to handle this is to not even retry a single time and notice the team via Rollbar (with the global exception handler)
Retrying RecordNotFound is critical to deal with this issue:
https://github.com/mperham/sidekiq/wiki/FAQ#why-am-i-seeing-a-lot-of-cant-find-modelname-with-id12345-errors-with-sidekiq
For every error you want to ignore, there's almost always a case where you want to retry it.
Lots of issues are caused by bad practices and wrong usage. Not retrying AR::RecordNotFound encourages developers to continue enqueueing jobs in even before_save - Sidekiq will silently retry them and it would work at some point. In one of the projects I worked recently, I ignored AR::RecordNotFound manually in every worker that did Model.find() because we were certain that we wouldn't schedule a job for a record that was not persisted yet. When this exception is thrown in this product, it's a sign that someone deleted the record before the job got executed by Sidekiq - which is perfectly acceptable.
So, I really don't see a reason to not add a feature thinking on developers that are using wrong practices that are even advised in gem's documentation.
When this exception is thrown in this product, it's a sign that someone deleted the record before the job got executed by Sidekiq - which is perfectly acceptable.
If that's an expected use case, you should use find_by_id so it returns nil in that case, not find. Raising an error should indicate there's a bug in the system.
The whole point is to have this flexibility while doing ARModel.delay.operation.
What really are the disadvantages of allowing a developer to specify a global list of exceptions that shouldn't be retried?
@rafaelsales ah, the case is where ARModel.operation should ordinarily raise, but where adding .delay should not retry if it does raise? It does seem fairly niche. Does the custom exception handler idea in a third party gem work?
@aprescott Yes. There are really some cases where ar_model.operation would work but ar_model.delay.operation won't - you can see the image I posted above. If the job takes like 30 minutes to get executed and the record was removed meanwhile. There are other scenarios and exceptions that I had ignored in nearly every job and I would put in a global retry ignore list if possible
Im seeing a lot of Net::SMTPServerBusy: 401 4.1.3 Bad recipient address syntax because the email address doesn't exists, so it would be really useful if i could ignore these.
Just to throw my 2c in, I'm currently building a system on top of Sidekiq which uses Capybara to scrape a partner's website (don't ask, its a harrowing tale). If they change the markup of their product pages then Capybara is going to throw an exception - that's definitely something I want to be notified of, and I'd like the job in the dead job queue so I can retry it later, but there's no point in retrying until the code has been updated to do the right thing.
@jellybob that sounds different. Is it not handled by a configuration of retry: 0 on the worker?
From the wiki:
Only jobs configured with 0 or greater retries will go to the Dead Job Queue.
Interesting. I'm sure I read it as zero retry jobs not going to the DJQ. Anyway, the particular job can fail in a variety of ways, some of which do warrant retrying.
I guess I could break it into two jobs, but it would complicate matters quite a bit for the sake of doing this.
Mostly thinking out loud since I'm not at a computer, but can I explicitly push a job to the DJQ myself? That would seem like a decent compromise, I'll have to look into it tomorrow.
Just throwing in my 2 cents here. Why don't you just write a simple custom middleware that handles the exceptions you want to prevent retries for?
Basic example:
def call(worker, msg, queue)
begin
yield
rescue ActiveRecord::RecordNotFound => e
msg['retry'] = false
raise
end
end
Maybe even do this:
def call(worker, msg, queue)
begin
yield
rescue ActiveRecord::RecordNotFound => e
msg['retry'] = false
raise
rescue Exception => e
if worker.respond_to?(:handle_error)
worker.handle_error(e)
else
raise
end
end
end
Because I hadn't realised it was possible, that looks perfect to me!
On Tue, 1 Dec 2015 02:51 Jake Hoffner [email protected] wrote:
Just throwing in my 2 cents here. Why don't you just write a simple custom
middleware that handles the exceptions you want to prevent retries for.Basic example:
def call(worker, msg, queue)
begin
yield
rescue ActiveRecord::RecordNotFound => e
msg['retry'] = false
raise
endend—
Reply to this email directly or view it on GitHub
https://github.com/mperham/sidekiq/issues/2072#issuecomment-160832087.
And if you don't want it to end up on dead list either, you can set msg['dead'] = true.
module MyApp
class NonRetriableJobs
def call(worker, msg, queue)
yield
rescue ActiveRecord::RecordNotFound
msg['retry'] = false
msg['dead'] = false
raise
end
end
end
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add MyApp::NonRetriableJobs
end
end
In case people land here trying to achieve the same thing, this technique as mentioned by https://github.com/mperham/sidekiq/issues/2072#issuecomment-160832087 no longer works in Sidekiq 5+. The technique used to work when Sidekiq had a RetryJobs middleware that looks at the job hash passed to it, but that's no longer the case in Sidekiq 5+.
To stop a job from retrying simply return within the rescue block. You can include any error reporting in the same rescue block before returning, or if your error reporting is handled by another middleware (like Bugsnag::Sidekiq), then nothing needs to be done as long as that middleware runs first.
Please correct me if I'm wrong though.
@d9su Thanks so much for bringing this up! We just upgraded to Sidekiq 5 and I did not realize that this behavior changed.
@rafaelsales Note that those of us using databases with read-replicas in real time have to deal with eventual consistency, where RecordNotFound may be related to replication delay.
we can just set the sidekiq_options retry: 0 in the rescue block for that specific exception and then just reset this in normal cases.
```ruby
Class.sidekiq_options retry: default_value
rescue SomeClass::Error => error
Class.sidekiq_options retry: 0
end
Hey @d9su, thanks for pointing out that 5+ does not support this middleware approach. Do you know any workaround in 5+ how to achieve a following thing?
Let's say we have a following job:
class TestRetryJob
class RetryException < StandardError; end
include Sidekiq::Worker
sidekiq_options retry: 0
def perform
puts "Working..."
sleep(5)
raise RetryException
end
end
With the "old" solution I would add a following middleware to retry when RetryException happens and simply mark the job as failed (with no retry) when other errors occur:
class NonRetriableJobs
def call(_worker, msg, _queue)
yield
rescue TestRetryJob::RetryException
msg["retry"] = (msg["retry"] || 0) + 1
msg["dead"] = false
raise
end
end
Do you have any idea how to do the same thing in Sidekiq 5+? I'd like to avoid reenqueing the job manually, because it'll mess up a bit with batching mechanism.
@Bajena If you look at my comment above:
To stop a job from retrying simply return within the rescue block. You can include any error reporting in the same rescue block before returning, or if your error reporting is handled by another middleware (like
Bugsnag::Sidekiq), then nothing needs to be done as long as that middleware runs first.
Have you tried that?
Hey @d9su thanks for a quick response! The problem I think with this approach is that the job won't be marked as failed in case of a non retriable error:
class TestRetryJob
class RetryException < StandardError; end
include Sidekiq::Worker
sidekiq_options retry: 999999
def perform
do_work
rescue RetryException
puts "Retrying..."
raise
rescue StandardError
# If we do nothing here, the job will be marked as successful
end
end
@Bajena IIRC, jobs marked as failed are meant to be automatically retried, the two concepts are inseparable. I think the issue is the sidekiq job lifecycle is pretty opinionated and not very flexible for manually sending things around the pipeline. It's understandable since sidekiq relies on these automatic behaviors to function as it is.
So if you want anything custom, it's better to build your own error reporting mechanism.
For example:
class NonRetriableJobs
def call(_worker, msg, _queue)
yield
rescue TestRetryJob::RetryException
MyLogger.info('Non retriable error occurred.')
end
end
Thank you again @d9su! So it seems that it won't be possible to achieve the same thing as in sidekiq 4. I'll have to come up with another solution then. Thanks anyways for super quick answers :)
You don’t need middleware to achieve what you want. You can write a module
with a perform method in it and then include that module instead of sidekiq
worker. When included the module would wrap the perform method by
prepending itself into the inheritance hierarchy.
Once there you could grab the job instance and modify its singleton class
with custom retry options after the call. You might need to override the
retry_options class method too, and call it on the jobs singleton class, to
override retry logic as you see fit.
My 2c.
On Tue, Oct 22, 2019 at 11:09 PM Jan Bajena notifications@github.com
Thanks for an idea @kigster! Isn't such approach non thread-safe? Won't changing class options affect other workers that may be processing jobs at the same time?
@bajena I’m glad someone is thinking of thread safety!
When sidekiq runs a job it instiates a worker and disposed of the instance after the job completes. If you are able to intercept the job instance in the module (or in your middleware) you can modify its private “class” called singleton class, which is only attached to this instance and has tightly coupled lifecycle. What I meant is that you’d need to apply your retry queue changes to this class and not to the shared worker class. Wether this works or not unfortunately depends on how sidekiq looks up retry options, but in ruby you can overwrite almost anything by reopening it and modifying it.
Read this for more info:
https://www.google.com/amp/leohetsch.com/demystifying-ruby-singleton-classes/amp/
@kigster that's a pretty smart use of duck typing and monkey patching 😛! A bit contrived but may be what's needed if you just want to take matters in your own hands.
Thanks for the explanation guys! It seems that singleton class trick should do the job here :)
@Bajena would you be able to post a concrete example of what this looks like? Would help a lot!
Most helpful comment
How is my reply impolite?