sidekiqswarm: Add support for preloading Rails app

Created on 20 Jul 2020  ยท  8Comments  ยท  Source: mperham/sidekiq

Ruby version: 2.5.5
Sidekiq / Pro / Enterprise version(s): Sidekiq 5.1.3 / Pro 4.0.3 / Ent 1.7.2

Context

In my project, with a fresh irb process

require 'bundler/setup'
Bundler.require('default', 'development')

consumes 272MB memory.

Repro

Start a sidekiq swarm with 30 processes. The processes consume 10GB (> 30 * 272 MB) memory.

Investigation

I looked into /proc/<pid>/smaps and aggregated the memory segments:

{"Shared_Clean:"=>15876, "Shared_Dirty:"=>42376, "Private_Clean:"=>0, "Private_Dirty:"=>279384, "Shared_Hugetlb:"=>0, "Private_Hugetlb:"=>0}

About ~9GB (30 * 272 MB) of the segments are under Private Dirty.

I patched it so it calls boot_system before forking the child processes. With the patch,

{"Shared_Clean:"=>6312, "Shared_Dirty:"=>299552, "Private_Clean:"=>0, "Private_Dirty:"=>21836, "Shared_Hugetlb:"=>0, "Private_Hugetlb:"=>0}

The private mem segs only take ~0.62GB, and the overall memory usage dropped to ~1.5GB.

Thoughts on this approach?

All 8 comments

I'm unclear what you are suggesting.

  1. Are you suggesting including the environment group in the Bundler.require?
  2. Are you suggesting booting Rails before forking?

1 shouldn't be too hard; you can already do it today with SIDEKIQ_PRELOAD=default,development. 2 is a huge change and is fraught with peril, that peril is why every daemon except Sidekiq includes after_fork callbacks.

Are you suggesting booting Rails before forking?

Yes, that's what I'm trying to suggest. For my setting at least, the gems preloaded by Bundler.require won't stick in the shared memory. All the forked processes ended up having a copy of pretty much all gems.

I'm not sure how to evaluate the risks of this approach -- each process should have their own copy of all objects (copy-on-write), what might go wrong if Rails was booted before forking?

The issue is that any locks, threads or sockets created in Rails initializers will be shared by all child processes. That's the purpose of after_fork blocks: to cleanup or re-initialize anything needed by each child. Unfortunately it's also a major source of support issues, people frequently don't read the docs and they would think it's a Sidekiq bug breaking their app. I could make it opt-in but that's why I've never implemented it.

I could make it opt-in

That'd be great! Thank you for explaining, I see how this can be tricky now. For apps that do not require re-initializing, this can be a big win.

Maybe there can be an after_fork hook in sidekiq too?

No guarantees but I'll investigate.

I'm implementing this right now. Here's the before and after memory data I see on OSX.

Before

โฏ ps aux | grep sidekiq
mikeperham       38022   0.0  1.0  4436192  81776 s000  S+    2:54PM   0:01.77 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       38021   0.0  1.0  4441336  82664 s000  S+    2:54PM   0:02.00 sidekiq 6.1.1 myapp [0 of 2 busy] leader   
mikeperham       38020   0.0  1.0  4435668  81688 s000  S+    2:54PM   0:01.77 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       38019   0.0  1.0  4435668  81600 s000  S+    2:54PM   0:01.77 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       38013   0.0  0.5  4383092  40528 s000  S+    2:54PM   0:00.94 sidekiqswarm, managing 4 processes      

After

โฏ ps aux | grep sidekiq
mikeperham       14192   0.0  0.5  4432088  42012 s000  S+   10:24AM   0:00.29 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       14191   0.0  0.5  4435636  43440 s000  S+   10:24AM   0:00.31 sidekiq 6.1.1 myapp [0 of 2 busy] leader      
mikeperham       14190   0.0  0.5  4431564  41980 s000  S+   10:24AM   0:00.29 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       14189   0.0  0.5  4431564  41940 s000  S+   10:24AM   0:00.29 sidekiq 6.1.1 myapp [0 of 2 busy]      
mikeperham       14183   0.0  1.0  4413596  80344 s000  S+   10:24AM   0:02.06 sidekiqswarm, managing 4 processes

Notice the multiple children drop from 80MB to 40MB, the single parent swarm process grows from 40MB to 80MB. This is on a trivial default Rails app; that's massive savings.

As for an after_fork callback, here's my proposed API:

Sidekiq.configure_server do |config|
  config.on(:fork) do
    # ActiveRecord::Base.clear_active_connections! or whatever
  end
end

Awesome! This is great!

Was this page helpful?
0 / 5 - 0 ratings