Sidekiq: Memory monitoring does not work properly after upgrade

Created on 11 Oct 2019 · 5Comments · Source: mperham/sidekiq

After upgrade

Ruby version: 2.5.1
Sidekiq / Pro / Enterprise version(s): 6.0.1/5.0.1/2.0.0

I believe the issue could be reproduced by just running something like this: SIDEKIQ_COUNT=1 SIDEKIQ_MAXMEM_MB=100 bundle exec sidekiqswarm -q test. The SIDEKIQ_MAXMEM_MB should be small enough to immediately trigger the restart. I don't have any jobs in the test queue.

I can definitely see that memory monitoring is activated:

[swarm] Starting memory monitoring with max RSS of 102400KB

However, it does not look like child process is accepting the restart sequence:

[swarm] Process 97837 too large at 312720KB, stopping it...
[swarm] Process 97837 too large at 313156KB, stopping it...
[swarm] Process 97837 too large at 313388KB, stopping it...

I tried this on Linux and macOS with the same results.

Before upgrade

We had these versions: 5.2.7/4.0.5/1.8.1

The same scenario clearly works, so must be an issue in newer version of one of the gems?

[swarm] Process 99408 too large at 336892KB, stopping it...
2019-10-11T09:32:23.128Z 99408 TID-ow0kpify8 INFO: Got USR2, starting graceful shutdown
2019-10-11T09:32:23.129Z 99408 TID-ow0kpify8 INFO: Terminating quiet workers
...

Source

krasnoukhov

Most helpful comment

Took me 5 minutes with git bisect to track this down. Thanks for the command line to reproduce the issue!

mperham on 11 Oct 2019

👍2

All 5 comments

It's working fine for me with 10MB, which triggers a restart almost immediately:

$ SIDEKIQ_COUNT=1 SIDEKIQ_MAXMEM_MB=10 bundle exec sidekiqswarm -q test
[swarm] Preloading Bundler groups ["default"]
[swarm] Starting memory monitoring with max RSS of 10240KB
2019-10-11T15:54:17.242Z pid=17555 tid=owl47z3q3 INFO: Booting Sidekiq 6.0.1 with redis options {:id=>"Sidekiq-server-PID-17555", :url=>nil}
2019-10-11T15:54:17.395Z pid=17555 tid=owl47z3q3 INFO: Running in ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
2019-10-11T15:54:17.395Z pid=17555 tid=owl47z3q3 INFO: Sidekiq Pro 5.0.1 / Sidekiq Enterprise 2.0.1, commercially licensed.
2019-10-11T15:54:17.400Z pid=17555 tid=owl47z3q3 INFO: Starting processing, hit Ctrl-C to stop
2019-10-11T15:54:17.401Z pid=17555 tid=owl4brczr INFO: SuperFetch activated
2019-10-11T15:54:17.404Z pid=17555 tid=owl4bpqpb INFO: Managing 2 periodic jobs
2019-10-11T15:54:17.406Z pid=17555 tid=owl4bpqpb INFO: Gained leadership of the cluster
[swarm] Process 17555 too large at 69544KB, stopping it...
[swarm] Child exited, PID 17555, code , restarting...
2019-10-11T15:54:52.317Z pid=17572 tid=owl47z3qk INFO: Booting Sidekiq 6.0.1 with redis options {:id=>"Sidekiq-server-PID-17572", :url=>nil}

Did you update any other gems as part of the upgrade?

mperham on 11 Oct 2019

@mperham Thanks for the reply. We also updated the redis gem but this does not seem related.

In your logs, I don't see a message about "Got USR2, starting graceful shutdown". Could that be a different shutdown mode perhaps?

krasnoukhov on 11 Oct 2019

Yeah, I am seeing some weirdness. The child process seems to stop logging, that's the first clue but I'm still unsure what's wrong.

mperham on 11 Oct 2019

👀1

We're seeing the same! We didn't run swarm before 6.0.1, but there are repeated logs like this while the processes stick around:

2019-10-11T20:00:43.796322+00:00 app[worker.1]: [swarm] Process 813 too large at 420668KB, stopping it...
2019-10-11T20:00:43.796695+00:00 app[worker.1]: [swarm] Process 1387 too large at 648400KB, stopping it...

"Stop" from the web UI seems to work fine, though, I see "quiet" right away and they clean up a bit later. It almost looks like the processes get more jobs and decide not to shut down.