Is there a guide to how many worker_threads you should be running and how to calculate it? I've found the starting with 5 to be a deceptively small number and haven't been able to find many documents that explain how to pick a good number. I've jumped my saltmaster up to a m4.x10large at this point with 256 threads and still am hitting points where I have to just kill it all and start over because of latency between some of the threads and some of my sites.
Salt Version:
Salt: 2016.3.3
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 1.5
gitdb: 0.5.4
gitpython: 0.3.2 RC1
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
RAET: Not Installed
smmap: 0.8.2
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-92-generic
system: Linux
version: Ubuntu 14.04 trusty
Wow! m4.x10large! How many minions are we talking about here?
not many.... but there was some weirdness in some of the configs and one was literally in an infinite highstate loop.... noticed it by watching active jobs.... that being said... I still see a lot of lag in the zeroMQ and I still haven't found any good guide on defining number of threads or size of salt master
We have anecdotal accounts of quad-core machines with 8GB of RAM being able to host 5000+ minions with worker_threads set to 20, but this is also dependent on lots of other factors (jobs in salt-scheduler, size of pillar, external pillars being present, custom grains, etc). I'll ping @jacobhammons, who is in charge of our docs, and see if he can put something together that's a little more comprehensive. What do you think @jacobhammons ?
Probably not super helpful, but we manage around ~27k minions on a single 80 core/256GB bare-metal, with about 75 worker threads.
75 worker threads
It would great to get some docs on this, or at least some guidance on how to choose this number. Should you go over your cpu count? By how much?. Is there a script we can write to load test up until msg delivery failure.
Any update on this? Also looking for optimal worker thread and cpu relationship. Would also like guide on proper metrics to setup and monitor to know when cpu or worker threads are insufficient.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
ping
Can this be re-opened please? Some guidance on setting worker_threads would be really helpful.
Hello Anita,
what i've learned is:
CPU cores * (3 or 4) => max batch size
worker_threads => max batch size + 25 (spare)
@disaster123 Thank you. I will give that a try!
Most helpful comment
ping