Sidekiq: Kubernetes liveness probe for worker

Created on 26 Oct 2017 · 6Comments · Source: mperham/sidekiq

Ruby version:
Sidekiq / Pro / Enterprise version(s):
5.0.5

I'd like to have/implement a kubernetes liveness probe for a sidekiq worker.

The liveness probe will GET :8080/ and look for an HTTP 200 response. Effectively, I need to know that the worker is alive and successfully pinging redis, but I need to know this from outside.

My threading is rusty, but I was considering threading a simple http server that responds with a 200 ok.

If the main worker thread dies, I want the http thread to die as well.

Has this already been done? If so, any pointers to it?
If not, does this sound like a reasonable plan? Any pointers?

Something like:

require 'socket'
LIVENESS_PORT = 8080

Sidekiq.configure_server do |config|
  config.on(:startup) do
     Sidekiq::Logging.logger.info "Starting liveness server on #{LIVENESS_PORT}"
     Thread.start do
       server = TCPServer.new('localhost', LIVENESS_PORT)
       loop do
         Thread.start(server.accept) do |socket|
           request = socket.gets # Read the first line of the request (the Request-Line)
           ::Sidekiq.redis do |r|
             sidekiq_response = r.ping
           end

           if !sidekiq_response.eql? 'PONG'
             response = "Sidekiq is not ready: Sidekiq.redis.ping returned #{res.inspect} instead of PONG\n"
             Sidekiq::Logging.logger.error response
           else
             response = "Live!\n"
           end
           socket.print "HTTP/1.1 200 OK\r\n" +
                            "Content-Type: text/plain\r\n" +
                            "Content-Length: #{response.bytesize}\r\n" +
                            "Connection: close\r\n"
           socket.print "\r\n" # blank line separates the header from the body, as required by the protocol
           socket.print response
           socket.close
         end
       end
     end
  end
end

Thoughts?

Source

rosskevin

👍3

Most helpful comment

Hi,
I just released a gem using the approach discussed here.
This approach worked really good for me.
Hope that helps.
https://github.com/arturictus/sidekiq_alive

arturictus on 8 May 2018

👍11

All 6 comments

Hi Kevin, that's a simple impl but it looks like it covers your needs. It would die in the case of a network error so you might need some rescueing in order to recover.

However: a user might have 20 Sidekiq processes on a single machine. Would you expect them to take port 8080-8100? Is that not the way k8s works, more like one process per container/IP?

mperham on 27 Oct 2017

My worker is a docker container, technically a kubernetes pod. A pod can have multiple containers, though it is discouraged, but sometimes it can be necessary - this is usually called a sidecar.

So, I can run 20 worker pods (likely just 1 with horizontal autoscale on). All can run on 8080, kubernetes will map this, but 8080 itself is only exposed for kubernetes liveness probing. I think in the case of network error here (local network within the kube cluster), I might as well restart the worker. I'll have to think about that, but thanks for looking it over. I'll give it a shot.

rosskevin on 27 Oct 2017

👍1

@rosskevin did this work for you? Currently I don't check to see if the worker is up.

ramontayag on 24 Jan 2018

Last I checked I believe it did. We brought down that infrastructure and are about to bring it up again, so I can't give you a definite answer. It is certainly in the neighborhood of being correct.

rosskevin on 24 Jan 2018

Hi,
I just released a gem using the approach discussed here.
This approach worked really good for me.
Hope that helps.
https://github.com/arturictus/sidekiq_alive

arturictus on 8 May 2018

👍11

Folks, what do you think about an alternative solution (purely for Sidekiq 6 though). Would it do the same job?

#!/usr/bin/env ruby
require_relative '../config/boot'
require 'sidekiq/api'

processes = Sidekiq::ProcessSet.new.size
$stdout.puts "Sidekiq processes: #{processes}"

processes.zero? and exit 1

It returns exit code 1 if there are no running processes.