I've seen that after 8 seconds after a TERM signal every worker is forcefully killed and pushed back to redis, in my code I have a mutex that uses redis, if it gets forcefully killed the mutex stays open until it goes in timeout.
It would be nice to be able, from inside the worker, know if the current worker is going to be killed (so we're in these 8 seconds). In this way I can safely unlock the mutex which will be re-acquired when another worker picks the job.
You can increase the timeout https://github.com/mperham/sidekiq/wiki/Signals
@jonhyman that won't work, heroku still kills the dyno after 10 seconds. If a job has to run for more 10 minutes, you can't just increase the timeout
Oh didn't realize you were on heroku. How do you want to find out you're in those 8 seconds? Any blocking i/o (like a database call) will probably prevent you from reading a value reliably within 8 seconds.
Tune -t
On Nov 28, 2015, at 14:16, Alessandro Tagliapietra [email protected] wrote:
I've seen that after 8 seconds after a TERM signal every worker is forcefully killed and pushed back to redis, in my code I have a mutex that uses redis, if it gets forcefully killed the mutex stays open until it goes in timeout.
It would be nice to be able, from inside the worker, know if the current worker is going to be killed (so we're in these 8 seconds). In this way I can safely unlock the mutex which will be re-acquired when another worker picks the job.
—
Reply to this email directly or view it on GitHub.
I don't have a blocking I/O, I'm doing a lot of work but it's mostly a loop of a big array, something like Sidekiq.current_worker.is_terminating? would be nice, that can interrupt the loop, release the lock and re-enqueue the job
@mperham increasing -t won't help, heroku will SIGKILL the process which maybe can also cause the job to not be re-enqueued
Sidekiq.current_worker.is_terminating? seems like a reasonable thing to ask for IMO. But, I'll offer another possible work-around instead of addressing that directly:
... I'm doing a lot of work but it's mostly a loop of a big array, ...
How about splitting up that work into many sub-jobs? Something like:
class ParentWorker
def perform
@big_array.each do |record|
ChildWorker.perform_async record
end
end
end
@mikegee the problem in my case is that there is a global lock (based on the data I'm currently processing) that needs to be released after the last loop item is processed (that's harder to manage when each item is processed in a separate job), not mentioning that I also need an order of processing due so pre-requisites the items at the bottom have.
@alex88 You need to signal to your Heroku processes that they will be shutting down soon, using Sidekiq::Process#quiet!.
@mperham how? if inside the worker code I can't know neither when it receives the TERM signal?
Heroku already sends the TERM signal, if inside my loop I know that it has been received, I would gracefully exit and re-enqueue, without having to wait for SIGKILL
Awesome, I'll report back if that works also inside the worker
@mperham I've just tried to but that code in the worker but seems no signal is being sent to the worker
really? that was so offensive it needed to be deleted? all I did was say i don't understand the reason that a Sidekiq.current_worker.is_terminating? method wouldn't be useful, the worker having access to it's own process doesn't seem out of scope
The worker does have access to its own process. I literally showed code how to do it. It does not need to be baked into 100% of workers.
you "literally" showed code that was reported back not to work. if a worker has access to it's own process, what is the method to retrieve the process for a running job from inside perform?
self.worker_process would simplify this by magnitudes
this thread is just asking for a built-in way of querying process.stopping? from a running job
I've filled Sidekiq and related with as many features as I feel are safe and reasonable to expect. I avoid features and APIs that are unsafe or tricky to use properly. That Signal.trap example is one such misstep on my part.
This API is not made available because it's racy data: your code and logic will be full of race conditions. For instance, the quiet property below is set every 5 seconds so you can't guarantee you will see it until 5 seconds after the signal. Instead I tell people to use transactions and idempotency. This works:
# run as: bundle exec sidekiq -t 6 -r ./this.rb
require 'sidekiq/util'
class MyWorker
include Sidekiq::Worker
include Sidekiq::Util
def perform
loop do
puts("Terminating", Sidekiq.redis {|c| c.hget(identity, "quiet") } == "true")
sleep 1
end
end
end
MyWorker.perform_async
Most helpful comment
I've filled Sidekiq and related with as many features as I feel are safe and reasonable to expect. I avoid features and APIs that are unsafe or tricky to use properly. That
Signal.trapexample is one such misstep on my part.This API is not made available because it's racy data: your code and logic will be full of race conditions. For instance, the quiet property below is set every 5 seconds so you can't guarantee you will see it until 5 seconds after the signal. Instead I tell people to use transactions and idempotency. This works: