Ruby version: 2.5.1
Sidekiq / Pro / Enterprise version(s): 6.0.1/5.0.1/2.0.0
I believe the issue could be reproduced by just running something like this: SIDEKIQ_COUNT=1 SIDEKIQ_MAXMEM_MB=100 bundle exec sidekiqswarm -q test. The SIDEKIQ_MAXMEM_MB should be small enough to immediately trigger the restart. I don't have any jobs in the test queue.
I can definitely see that memory monitoring is activated:
[swarm] Starting memory monitoring with max RSS of 102400KB
However, it does not look like child process is accepting the restart sequence:
[swarm] Process 97837 too large at 312720KB, stopping it...
[swarm] Process 97837 too large at 313156KB, stopping it...
[swarm] Process 97837 too large at 313388KB, stopping it...
I tried this on Linux and macOS with the same results.
We had these versions: 5.2.7/4.0.5/1.8.1
The same scenario clearly works, so must be an issue in newer version of one of the gems?
[swarm] Process 99408 too large at 336892KB, stopping it...
2019-10-11T09:32:23.128Z 99408 TID-ow0kpify8 INFO: Got USR2, starting graceful shutdown
2019-10-11T09:32:23.129Z 99408 TID-ow0kpify8 INFO: Terminating quiet workers
...
It's working fine for me with 10MB, which triggers a restart almost immediately:
$ SIDEKIQ_COUNT=1 SIDEKIQ_MAXMEM_MB=10 bundle exec sidekiqswarm -q test
[swarm] Preloading Bundler groups ["default"]
[swarm] Starting memory monitoring with max RSS of 10240KB
2019-10-11T15:54:17.242Z pid=17555 tid=owl47z3q3 INFO: Booting Sidekiq 6.0.1 with redis options {:id=>"Sidekiq-server-PID-17555", :url=>nil}
2019-10-11T15:54:17.395Z pid=17555 tid=owl47z3q3 INFO: Running in ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
2019-10-11T15:54:17.395Z pid=17555 tid=owl47z3q3 INFO: Sidekiq Pro 5.0.1 / Sidekiq Enterprise 2.0.1, commercially licensed.
2019-10-11T15:54:17.400Z pid=17555 tid=owl47z3q3 INFO: Starting processing, hit Ctrl-C to stop
2019-10-11T15:54:17.401Z pid=17555 tid=owl4brczr INFO: SuperFetch activated
2019-10-11T15:54:17.404Z pid=17555 tid=owl4bpqpb INFO: Managing 2 periodic jobs
2019-10-11T15:54:17.406Z pid=17555 tid=owl4bpqpb INFO: Gained leadership of the cluster
[swarm] Process 17555 too large at 69544KB, stopping it...
[swarm] Child exited, PID 17555, code , restarting...
2019-10-11T15:54:52.317Z pid=17572 tid=owl47z3qk INFO: Booting Sidekiq 6.0.1 with redis options {:id=>"Sidekiq-server-PID-17572", :url=>nil}
Did you update any other gems as part of the upgrade?
@mperham Thanks for the reply. We also updated the redis gem but this does not seem related.
In your logs, I don't see a message about "Got USR2, starting graceful shutdown". Could that be a different shutdown mode perhaps?
Yeah, I am seeing some weirdness. The child process seems to stop logging, that's the first clue but I'm still unsure what's wrong.
We're seeing the same! We didn't run swarm before 6.0.1, but there are repeated logs like this while the processes stick around:
2019-10-11T20:00:43.796322+00:00 app[worker.1]: [swarm] Process 813 too large at 420668KB, stopping it...
2019-10-11T20:00:43.796695+00:00 app[worker.1]: [swarm] Process 1387 too large at 648400KB, stopping it...
"Stop" from the web UI seems to work fine, though, I see "quiet" right away and they clean up a bit later. It almost looks like the processes get more jobs and decide not to shut down.
Took me 5 minutes with git bisect to track this down. Thanks for the command line to reproduce the issue!
Most helpful comment
Took me 5 minutes with git bisect to track this down. Thanks for the command line to reproduce the issue!