Apologies for the lack of hard information. I'm seeing this on several (in the dozens) boxes on our infrastructure, so I've included as much information as possible from the box that has most recently exhibited this behavior.
So, I basically run sidekiq (via monit) with the following:
bundle exec sidekiq -C /path/to/sidekiq/config -e production -r /path/to/my/app/root -P /path/to/pids/dir -L /path/to/logfile
The user that runs this has full access to all applicable paths, all paths exist, and the pid file does, usually, get written. What I've found, though, is that the pid file is almost immediately deleted after creation on occasion. That is, when I add a one-second sleep to Sidekiq::CLI#write_pid and raise if the pid file no longer exists, the process dies (as I would expect). If I remove the one-second sleep (or the raise), I occasionally end up with a missing pid file, which causes monit to start a second worker process (two of which now exist in this scenario).
Here is the relevant change that I made to my installed lib/sidekiq/cli.rb for debugging purposes: https://gist.github.com/ess/a1f8a7c21f34e5bf7bfc
Any insight on this would be appreciated, given that the change that I just mentioned almost certainly isn't up to the contribution requirements :)
Have you considered using a modern init system like Upstart or Systemd to control Sidekiq so that you don't need pidfiles at all?
https://github.com/mperham/sidekiq/tree/master/examples/upstart
@mperham Thanks for having a look. Unfortunately, we're pretty much locked into our current init (and monit) for the foreseeable future and would really appreciate this behavior being corrected.
It's particularly odd in that I don't see any feasible reason that the pid file would disappear within such a short time frame, given that both the CLI and the spawned worker remain.
Sidekiq never removes a PID file. I'd guess something else in your system is deleting it.
I'll do some further testing to more strongly rule that possibility out.
So, I believe I know what's happening, and I'll leave this here for anybody that might be running into a similar issue. Effectively, Sidekiq is just plain too fast for its own good in this respect.
So as to ensure that monit doesn't try to watch an old PID, before we run the canonical sidekiq executable, we rm -f the pid file that it uses. The very next step is to run sidekiq. The problem, however, is that we're hitting a race condition: the results of the rm are not necessarily synced before sidekiq writes to the still quasi-existing file location. So, what we end up with looks like this:
rm -f /path/to/pidsidekiq -P /path/to/pid/path/to/pidThis was detected in the least intuitive way possible: I added strace -ff -o /sidekiq_strace to our sidekiq run to determine if something odd was happening in the interpreter. This slowed execution to the point that I was no longer able to reproduce the issue. I both love and hate serendipity.
@mperham You're welcome to re-close this, but it might be worth adding a pre-write existence check to Sidekiq::CLI#write_pid to either handle or work around this excessively frustrating edge case. Barring that, it might be worth a mention in the documentation ... something to the effect of "if you rely on pid files and remove them before doing a restart, be sure to sync before starting sidekiq as well."
I recommend people not use PID files at all. They are 20th century legacy and a big kluge. Learn and use the modern init system in your distro of choice.
My Inspeqtor tool can be used to replace monit when using a modern init.
That’s cool. I recommend people use the tools that they’re able to use in their environments. I also recommend explicitly listing “modern” kludges as dependencies if they are, indeed, dependencies. I further recommend behaving with humility and grace, but that’s a particularly antiquated idea.
I'm not trying to diss you, merely explaining that I want the documentation to reflect best practices. The pid file feature is there for people to use if they can't follow those best practices.
I totally get that, and I'm sure the others that have reported issues regarding the pid file feature understand that desire as well.
That being the case, I'd honestly suggest removing the pid file functionality from Sidekiq core altogether, as it's a feature that you don't really want. With absolutely no ironic intention, that would definitely dissuade folks from trying to use sidekiq on systems that are not compatible with systemd/upstart/inspeqtor, and those that still insist on using it in those environments could effectively recreate that pid file functionality on their own.
Many people I see here are "StackOverflow programmers", cutting and pasting until it works. Often they don't know there is a better alternative at all. In retrospect, your voice here sounds like you're experienced/senior so my commentary was overkill.
Realistically tons of people use these features, anyone using capistrano for instance so backwards compatibility precludes me from removing -d, -P and -L but I would in a perfect world. I'm going to be introducing multi-process "swarm" support soon and that will not allow daemonization. https://github.com/mperham/sidekiq/wiki/Ent-Multi-Process
No worries, @mperham. I was a bit of a jerk in my response, and I regret having done so. To be fair, I do occasionally miss cargo culting until everything magically works ;)
At any rate, I'm playing around with an alternative solution. I'll hit you up privately if it works out well enough that you could, indeed, remove the pid file feature from core.
Most helpful comment
That’s cool. I recommend people use the tools that they’re able to use in their environments. I also recommend explicitly listing “modern” kludges as dependencies if they are, indeed, dependencies. I further recommend behaving with humility and grace, but that’s a particularly antiquated idea.