Sidekiq: Reduce default concurrency

Created on 17 Jul 2018  ·  17Comments  ·  Source: mperham/sidekiq

Today Sidekiq uses a default concurrency of 25. These means Sidekiq will spawn 25 worker threads and execute up to 25 jobs concurrently in a process.

glibc has a major memory fragmentation issue which gets worse with more threads, causing many people to move to jemalloc.

I also happen to think that, with time and experience but no hard data, that 25 is pretty aggressive and most apps can peg a CPU with a lower number of cores. Developers testing locally on macOS rarely need such large concurrency.

I'd suggest we reduce the default concurrency from 25 to 15 in Sidekiq 5.2.0. This will save memory and reduce fragmentation and bloat on Linux. Anyone who wants to retain the old value can add -c 25 to their command line.

WDYT?

Most helpful comment

I could be talked down to 10. I think 5 is too low; most business apps are I/O heavy, allowing pretty decent concurrency even with GIL.

All 17 comments

I'd be curious to see the hard data. I don't think it would hurt to lower it, and my guess is that anyone that needs more than 15 will knowingly tune it to a much higher number in an environment where they have more processing power (ie not most Heroku dynos.)

Usually using between 5 and 10 myself.
For hardcore CPU usage we had to run 20x1 once (20 processes with 1 worker)

My experience with Rails (maybe similar to your workload, maybe not) is that you need about 6 threads per hyperthreaded CPU core to approximately fully saturate the CPU. Background threads might trend fewer if CPU heavy, or potentially even more threads if very I/O-heavy. But 5 or so is probably about right for many/most Ruby tasks.

So 25 would be enough to fully saturate a medium AWS instance. 15 would still run decently, but probably leave a bit of CPU idle - around 5%-10% with the workloads I run.

Obviously depends on the task. If you're just calculating giant Mandelbrot sets or late digits of Pi, 5-6 threads would saturate it just fine :-)

Don't think I've ever seen a need for >10. 4-6 workers per process, with 1 process per core, is far more common in my experience.

Heroku has plenty of processing power thank you very much. The “performance” and “private” large dyno has 14GB of ram and 8 dedicated VCPUs (8 hyperthreads backed by 4 real cores on top of a hypervisor). Though I do realize you said “most”.

FWIW I think 25 is pretty high. Puma default is 16. Even then most people tune it down on the web. It would be helpful to get some kid of a standardized metric around when it is helpful to add extra sidekiq workers on a box.

I can't tell here if you mean per process, or total. For total across multiple processes, 25 is probably great. Per process, yeah, 5 is reasonable, 10 is high and 25 is very high. Given the GIL, it's very hard to get CRuby to productively use more than 10-ish threads for real tasks -- and in cases where you can, it's because something like EventMachine or Node.js would have been a better choice than Ruby threads.

I could be talked down to 10. I think 5 is too low; most business apps are I/O heavy, allowing pretty decent concurrency even with GIL.

10-15 is also a reasonable insurance policy against pathological cases that really wish they were evented, but got written with threads anyway.

10 sounds about right to me, and even 15 would be better than 25.

Is there a nice way to use Etc.nprocessors to determine a happy number for most people? I think Celery uses that sort of logic to determine default concurrency FWIW. I do agree with those above that in optimized cases you're going to want to look at the nature of your workload and tweak accordingly.

@zachmccormick number of processors is irrelevant for MRI (i.e. what the vast majority of the community uses AFAICT); each Sidekiq process can be handled by only 1 processor due to the GIL, since Sidekiq achieves concurrency via threads. You need to run multiple Sidekiq processes to achieve true parallel processing.

Ah I see - didn't realize that! Thanks!

Sidekiq doesn't fork or scale processes, only threads. You need to start multiple Sidekiqs yourself, using the tool/init of your choice. Sidekiq Enterprise has a multi-process sidekiqswarm binary which scales Sidekiq processes according to CPU count.

https://github.com/mperham/sidekiq/wiki/Ent-Multi-Process

@amcaplan @zachmccormick The number of processors matters if you add processes. But as @mperham says, you'd have to do that yourself - Sidekiq won't do that automatically.

Also, while the GIL means each Ruby thread blocks all others when running Ruby, there are some non-Ruby operations (e.g. network or disk I/O, database, some parts of garbage collection, many things done with native extensions) which can happen on a background thread and don't block your Ruby process. Those things can happen in parallel if you have more than one processor, but not if you don't. That's a lot of what I was talking about above with "saturating" a processor with 6+ threads per process - that makes sure that even when most of your Ruby code is blocked, something is running and making forward progress.

Hehe, I gave a talk about this once (https://speakerdeck.com/amcaplan/threads-and-processes-lightning-talk-given-at-rails-israel-2015), maybe the slides will be useful to future issue watchers...

screen shot 2018-07-18 at 6 14 21 pm

screen shot 2018-07-18 at 6 13 09 pm

Obviously it's a bit oversimplified, but pretty good as a round estimate. Most Rails jobs I've seen hover around that figure.

Also worth noting the first 10 minutes of this 2015 talk by @schneems (a personal favorite - first time we were in the same room!) where playing with the setting led them to change concurrency from 30 to 4.

Yup! That's a great summary. But since the CPU percentage for a given task can vary, there's a bit of an asymptote as far as how many threads are necessary to saturate...

It's obvious that glibc's memory bloat, as discussed on my blog, gets worse as concurrency increases. I think reducing concurrency from 25 to 10 will reduce memory usage AND bloat, giving us a double win in memory.

Pro tip: you can get the old behavior by adding -c 25 to the command line.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fatcatt316 picture fatcatt316  ·  4Comments

edgarjs picture edgarjs  ·  3Comments

aglushkov picture aglushkov  ·  3Comments

mperham picture mperham  ·  4Comments

jlecour picture jlecour  ·  4Comments