It would be lovely if I could set, in a config file, a batch size that is used when a command will go out to too many minions.
I /should/ be doing: salt -v -b 8 'minion.*' state.highstate. However, I sometimes forget to add -b 8. When I forget that, the command goes out to ~200 minions. This causes enough load on the server that half of the boxes just plain don't finish.
What I'd love to see in the master config would look something like thise...
batch_size: 8
batch_trigger: 20
... if the command will go out to >= 20 minions, then run it in batches of 8.
That seems like a reasonable addition to me. What do you think @basepi?
I like this idea. I also think it would be good to add a 'confirm' option, that would ask the user 'You are about to run on
Love it.
+1
Man, it's amazing how bugs just fall silent when you address other concerns that make things like this much less painful. Still valid, though. I think this will become my weekend project.
+1 any news on this?
+1
It was a weekend project that didn't get finished that weekend... or ever. I got sidetracked with another project that has been chewing up all of my free time for the past few months. I'm getting closer, but still probably have another 2-3 months left. :(
Still, I want to find some time to make this happen. It seems like it should be easy, it's just a matter of finding the right place to stop and switch the user's command to a batch job. I'm a little bit nervous that the two execution options function quite differently from each other, but I haven't looked into it far enough to know. The worst case will likely be that when this option is enabled, there will be an extra delay before minions start executing commands.
I couldn't really sleep so I started playing around with an idea. (completely UNtested)
https://github.com/saltstack/salt/compare/develop...MTecknology:develop
I haven't even checked that it clears the syntax checker, but I don't see why this shouldn't work.
alias salt='salt --batch-safe-limit=50 --batch-safe-size=10'
If salt '*' test.ping were to match more than 50 minions, then switch to a batch job in increments of 10.
It could be possible to just use batch-safe-limit for both limits and call it batch-trigger. I don't know if the increased flexibility makes sense or not.
Side note! Should we also be creating an issue to support /etc/salt/{saltrc,saltrc.d/*.conf}? It would be nice to skip the need for an alias and just have config management drop in /etc/salt/saltrc.d/{limits,returners}.conf.
This solution has the down side that, when --batch-safe-limit is set, if the target is a glob type, the minion count matching that glob will always be checked. I feel like this is a pretty sane set of "if's" but to err on the side of caution, I started looking through what's going on.
TL;DR -- seems like there's a reasonable attempt to cache and keep things snappy.
171 class CkMinions(object):
172 '''
173 Used to check what minions should respond from a target
174
175 Note: This is a best-effort set of the minions that would match a target.
176 Depending on master configuration (grains caching, etc.) and topology (syndics)
177 the list may be a subset-- but we err on the side of too-many minions in this
178 class.
179 '''
....
647 minions = check_func(expr, delimiter, greedy)
I can't guarantee the PR is bug free, but it seemed to work in my lab. I decided I'd prefer make something available than continue with it not existing at all.
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=2 --batch-safe-limit=5 '*' cmd.run 'sleep 20'
Executing run on ['test-saltminion03', 'test-saltminion08']
Executing run on ['test-saltminion02', 'test-saltminion05']
Executing run on ['test-saltminion04', 'test-saltminion07']
Executing run on ['test-saltminion06']
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=2 --batch-safe-limit=8 '*' cmd.run 'sleep 20'
# no batch
(venv)root@test-saltmaster:/opt/venv# tail -n 2 etc/salt/master
batch_safe_limit: 5
batch_safe_size: 2
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt '*' cmd.run 'sleep 20'
Executing run on ['test-saltminion03', 'test-saltminion08']
Executing run on ['test-saltminion02', 'test-saltminion05']
Executing run on ['test-saltminion04', 'test-saltminion07']
Here's a few more...
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=5 --batch-safe-limit=5 '*' test.ping
Executing run on ['test-saltminion02', 'test-saltminion03', 'test-saltminion04', 'test-saltminion05', 'test-saltminion08']
Executing run on ['test-saltminion06', 'test-saltminion07']
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=50 --batch-safe-limit=5 '*' test.ping
Executing run on ['test-saltminion02', 'test-saltminion03', 'test-saltminion04', 'test-saltminion05', 'test-saltminion06', 'test-saltminion07', 'test-saltminion08']
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=5 --batch-safe-limit=50 '*' test.ping
# no batch
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=50% --batch-safe-limit=4 '*' test.ping
Executing run on ['test-saltminion02', 'test-saltminion03', 'test-saltminion08']
Executing run on ['test-saltminion04', 'test-saltminion05', 'test-saltminion07']
(venv)root@test-saltmaster:/opt/venv# salt -c ./etc/salt --batch-safe-size=1 --batch-safe-limit=2 '*' test.ping
Executing run on ['test-saltminion08']
Executing run on ['test-saltminion03']
Executing run on ['test-saltminion02']
Executing run on ['test-saltminion05']
Executing run on ['test-saltminion04']
Executing run on ['test-saltminion07']
Executing run on ['test-saltminion06']
Anyone up for testing and providing feedback?
Most helpful comment
It was a weekend project that didn't get finished that weekend... or ever. I got sidetracked with another project that has been chewing up all of my free time for the past few months. I'm getting closer, but still probably have another 2-3 months left. :(
Still, I want to find some time to make this happen. It seems like it should be easy, it's just a matter of finding the right place to stop and switch the user's command to a batch job. I'm a little bit nervous that the two execution options function quite differently from each other, but I haven't looked into it far enough to know. The worst case will likely be that when this option is enabled, there will be an extra delay before minions start executing commands.