Sidekiq: Signals handlers may be redefined within the job

Created on 26 Mar 2018  Â·  8Comments  Â·  Source: mperham/sidekiq

Ruby version: 2.4.3p205
Sidekiq version(s): 5.1.2

In a large codebase, eventually some developer may introduce a gem that messes with signal handlers. If any part of that gem is ever called from a job worker, original signal handlers will be overridden and Sidekiq (or any other job processing library) would become unresponsive.

Here is an example:

require 'sidekiq'

module NastyGem
  def self.do_smth
    trap("TERM") { }
    trap("INT") { }
  end
end

class MyJob
  include Sidekiq::Worker

  def perform
    NastyGem.do_smth
    puts "Workin' hard"
  end
end

I know that this case is pretty unlikely, but it terrifies me because often developers add new gems without studying their source, and I see how re-defining Sidekiq's signal handlers may become unnoticed. Graceful worker termination would be broke as a result, and it might take weeks before an SRE finds out about that.

In my case, I could "finalize" signal handlers by re-defining Signal.trap:

def run
  # Sidekiq defines signal handlers
  finalize_signal_trap
end

module FinalizedTrap
  def self.included(mod)
    mod.instance_eval do
      define_method(:trap) do |*_args|
        raise "omg omg omg! you are not allowed to use signal handlers after they've been set by Sidekiq"
      end
    end
  end
end

def finalize_signal_trap
  [Kernel, Signal].each do |klass|
    klass.__send__(:include, FinalizedTrap)
  end
end

1) It's unfortunate that Ruby allows to re-define signal handlers after they've already been set. I'm thinking about back porting trap("TERM", finalize: true) API to Ruby upstream.
2) Do you think Sidekiq could benefit from finalizing signal handlers?

Most helpful comment

Agreed. I'm happy to consider ways to make Sidekiq's signal handling more robust if it doesn't involve monkeypatching. I would open issues in those projects to ask how they can better integrate their signal handling; Sidekiq is the process "owner" and an app library should not be breaking its signal handling.

All 8 comments

Global monkey patching is bad.

@AlexWayfer sure, but like with any "X is bad" case you have to consider trade-offs and alternatives.

You may say "global state is bad", but then Signal.trap is exactly that piece of global state that Ruby (unfortunately) makes very easy to mutate.

If you had to choose between keeping the code free from monkey patches and between occasionally breaking graceful shutdown and lost jobs, which one would you choose? If an engineer in your team had to spend few days to find out that a random gem was causing an issue with worker shutdown, would you justify lost time and money with avoiding monkey patches?

Please also note that I'm not suggesting to introduce any monkey patches to Sidekiq. I've created the issue to discuss a problem. One way to solve it is to add a finalize: true option to Signal in Ruby standard library. Another (ad-hoc) way is to monkey patch Sidekiq or Resque in your app.

I'm looking forward to hear your opinions, maybe there's a better solution.

I've never heard of anyone else having this problem, can you give a more specific real example? Sometimes, if a problem is rare enough, the correct thing to do is nothing as it's not worth the trouble to fix.

Or rather, the correct thing to do is tell NastyGem to fix their bad practices.

@AlexWayfer sure, but like with any "X is bad" case you have to consider trade-offs and alternatives.

Sure. But I didn't come up with a good alternative. For example, refinements will not work for this case.

You may say "global state is bad", but then Signal.trap is exactly that piece of global state that Ruby (unfortunately) makes very easy to mutate.

Yes. Mutating Array#each is easy too. But… just don't do this, please.

Do you have a real case? Some open-source gem with this trap, which can be used in Sidekiq.

If you had to choose between keeping the code free from monkey patches and between occasionally breaking graceful shutdown and lost jobs, which one would you choose?

It depends on the case: sometimes global monkey patching may hurt more, than rewriting code with trap.

parallel gem is one example. When someone is lazy to spin a pool of threads, they would find a gem and do something like this (an example from gem's readme):

results = Parallel.map(['a','b','c']) do |one_letter|
  expensive_calculation(one_letter)
end

When surprisingly this piece of code would trap SIGINT.

There's also a legit trap in google/google-api-ruby-client and I've seen other cases in private gems.

I understand if you don't want to be too defensive in what developers may do from a Sidekiq job, but when you have hundreds of developers working in the same codebase, and someone uses Parallel.map that is _occasionally_ called from a job that _occasionally_ breaks graceful termination, you may become more paranoid in sandboxing what developers can do inside #perform.

Another side of them problem is that it's _very_ hard to find it without auditing all of your bundle. It may take months before we find about that NastyGem overwriting Sidekiq's (or Unicorn's, or Resque's) signal handlers.

Feel free to close is as it's not really a Sidekiq's issue. I opened it primarily to discuss the problem.

Agreed. I'm happy to consider ways to make Sidekiq's signal handling more robust if it doesn't involve monkeypatching. I would open issues in those projects to ask how they can better integrate their signal handling; Sidekiq is the process "owner" and an app library should not be breaking its signal handling.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mperham picture mperham  Â·  3Comments

fatcatt316 picture fatcatt316  Â·  4Comments

rajcybage picture rajcybage  Â·  3Comments

paul-ylz picture paul-ylz  Â·  4Comments

homanchou picture homanchou  Â·  3Comments