this might be trickier than just that, at the least salt should reload the modules on the minion
+1 on this issue. I have plans on modifying the salt master's configuration very, very often, and would prefer to reload the config rather than needing to restart the daemon (which would cause salt outages).
I will slate this for the next release then and investigate how viable it will be
I have added some code to help mitigate some issues when restarting the salt master. I am still looking into how to manage this one
I am running into this issue with upstart. When upstart signals a reload it sends SIGHUP to the parent salt-master. It then promptly dies without error or debug, orphaning all of the children salt-masters. Upstart then thinks the process is dead, leaving lots of orphaned children.
subsequent "start saltmaster" immediately dies off as the port is still used by the orphaned children
At the very least, the salt-master should catch the signal and kill off its children such as
salt/master.py:
Master.start() (line 343)
signal.signal(signal.SIGHUP, sigterm_clean)
Ooh, thanks for the report -- even if we don't properly reload, we need to handle SIGHUP more gracefully than that.
I don't want to be just another +1, but I too am planning to modify the configuration files. Mainly I already have an /etc/salt/master.d/nodegroups.conf file where I define my nodegroups (I am still not convinced by the available external node classifiers available). As such, any modification of said file must trigger a salt-master restart which, if I'm in the middle of a highstate call, triggers not-nice things to happen.
I tried to read salt's code to see if it was relatively easy to add this, but alas I did not understand much :)
Actually I've been digging around the daemons quite a bit recently and although I don't think this will be _easy_, it should be possible. I'll add this to my list of things to hack on ;)
Thr reason this does not work is that catching signals in python inturupts
the zmq threads and we get crashes of lost packets. We hope to get this
working with salt raet once we have implimented more of the major
components
On Sun, Jan 4, 2015, 22:12 Thomas Jackson [email protected] wrote:
Actually I've been digging around the daemons quite a bit recently and
although I don't think this will be _easy_, it should be possible. I'll
add this to my list of things to hack on ;)
Reply to this email directly or view it on GitHub
https://github.com/saltstack/salt/issues/570#issuecomment-68669161.
We already catch quite a few signals in the daemons and IIRC we don't have stuff exploding all over... I think the larger problem is going to be the varying classes that keep copies of the opts dict all over creation ;)
Good point. But just keep that in mind. And generally we are only catching
sigint and sigterm which do not care if there are zmq issues.
Just making sure you know the issues we have already seen here. But this is
one reason why we daemonize child procs instead of using sigchld. Which
will also cause isses with propogating the sighup
On Sun, Jan 4, 2015, 22:28 Thomas Jackson [email protected] wrote:
We already catch quite a few signals in the daemons and IIRC we don't have
stuff exploding all over... I think the larger problem is the varying
classes that keep copies of the opts dict all over creation ;)
Reply to this email directly or view it on GitHub
https://github.com/saltstack/salt/issues/570#issuecomment-68669808.
Yea, I'm thinking we'll want the parent to catch the sighup and coordinate the reload amongst the children. I have some ideas, but I'll need to make some time to mess with it :)
+1: I'm looking for a way to reload the nodegroups configuration only, without restarting the whole salt-master
@bbinet Pretty sure that currently works. At least, last time I tried. Nodegroups don't require master restart.
It does not work for me:
$ salt -c /config -N a test.ping
No minions matched the target. No command was sent, no jid was assigned.
The strange thing is that in the master logs, I can see that the a
nodegroup had actually matched the correct list of minions:
2015-07-03 12:16:48,382 [salt.master ][INFO ][178] Clear payload received with command publish
2015-07-03 12:16:48,384 [salt.utils.minions][DEBUG ][178] Evaluating final compound matching expr: ( ( set(['hl-lxc-1-dev', 'hl-mc-9999-dev', 'hl-mc-3-dev', 'hl-mc-8888-dev', 'hl-mc-4-dev', 'hl-mc-5-dev']) ) & ( set(['cm-mc-1-dev']) ) )
But I don't know why we then get the No minions matched the target
message.
Note that it works correctly when I target a nodegroup that was already existing when the salt-master was started.
Should I create a new issue specifically for that?
If you look at that log line, that expression will evaluate to an empty set. It's &
ing two sets, which have no strings in common. This is why it says no minions were matched. Are you certain your new nodegroup actually does match minions?
You're right, so this is working as expected.
Thanks and sorry for the noise.
No problem, just wanted to make sure we fixed a bug if there was one! ;)
Is it possible today to refresh just the gitfs_remotes or the file roots without restarting the salt-master?
@sametimesolutions No, not yet.
Is there any update for this, to reload salt-minion without restarting?
Unfortunately no, this is a very hard problem to solve that would require intensive re-architecting for the minion/master. It's hard to propagate reloaded configs across the various processes that are a part of the master, for one.
When it comes to the minion, a restart should not be so onerous. Minion restarts are fairly lightweight and minimize downtime. The master is where the problems are, since all the minions need to re-auth once the master comes back up, which can cause some load issues.
For us restart is very problematic. We need to reload minion after setting up new VM to enable reporting data with mine mechanism. This has to be executed as one of the first steps, so we can't use the "atd" scenario with delayed restart of salt-minion as a last step. When we execute highstate, it first fails (because minion was restarted) - so we have to run it twice on fresh machine.
Have you considered configuring the mine in pillar rather than in the minion config? It solves this very problem.
Sounds like a great suggestion, i did not know this feature, but looks like very promising. Is it also possible to configure mine_interval with pillar, or are we limited to mine_functions? Even if so, it will be possible to workaround longer mine interval with some sleep command - your help is greatly appreciated!
I cannot remember off the top of my head whether mine_interval works configured through pillar. It might, I just can't remember.
@marek-obuchowicz We work around this problem by using a set of startup_states. Within those is a simple service watch which restarts the minion when something important changes, works a treat here.
Also, I recently made #6691 happen, so starting in Boron, cmd.run_bg
can quite likely also be used in some capacity.
Subscribing because this also affects #13558. Although the issue is now >4 years old so I'm guessing not much is happening.
If catching signals is problematic, how about a work around where we use something else (maybe a salt event) to get the config reloaded?
To give some clarity-- the reason this is "difficult" is not so much catching the signal (which has some gotchas to deal with), but primarily that all the config values (and various classes) are initialized with certain config values at start-- and we'd need to make it so that these subsystems (the minion, reactor, etc.) would know how to "reload" their config. So, its not impossible-- but will require a decent amount of work to do correctly.
+1
This is a long thread. Is the feature documented somewhere? What should I do when I add/update new configurations /modules? This is very basic problem for daily tasks. Thanks.
I solved my minion-restart problems with a little salt script that copies (using file.managed) a small Python program to the minion to do the restart, then runs it using "cmd.run" with "bg: true".
See the gist.
ZD-1632
Is there any work going into this or is this postponed to the future?
+1
We @ juniper also got many schedulers configured and using influxdb returner to post data to DB, When we change DB config in /srv/salt/proxy, we have to restart proxy to take the config. With this, all the scheduler are gone. Any way to avoid proxy restart?
@cachedout @cr0hn
+1
+1
+1
+1
I know this quote is nearly 5 years old, but this is not correct:
@bbinet Pretty sure that currently works. At least, last time I tried. Nodegroups don't require master restart.
Docs:
When adding or modifying nodegroups to a master configuration file, the master must be restarted for those changes to be fully recognized.
A limited amount of functionality, such as targeting with -N from the command-line may be available without a restart.
https://docs.saltstack.com/en/latest/topics/targeting/nodegroups.html
If you are testing on the CLI: salt -N group1 test.ping
, then you are good, but according to the docs not all functionality is available without a restart.
I only bring this up because for a long time now we have abandoned nodegroups due to the fact we needed a restart and when I read this comment and tested it I realized maybe we don't need a restart. I started reading the nodegroup docs again before realizing that my initial thought was correct and that a restart _is_ in fact required.
The docs don't mention the functionality _not_ available, but I assume top file targeting would not be available as that is where I need this.
This is all a moot point anyway because currently there is a bug in 2019.2 that prevents compound matching of NodeGroups.
https://github.com/saltstack/salt/issues/52678
So +1 for a way to reload master/minion configs. I want a way to programatically get a list of minions and assign them roles. This would mean nodegroups would be constantly changing and needing to be restarted multiple times per day.
+1
Personally, I am working on my custom reactors. The master have about 200 minions, so it is quite a pain int the @ss to restart the whole master just to apply those changes.
What if we create a patch as a workaround until this will be implemented?
+1
+1
Most helpful comment
For us restart is very problematic. We need to reload minion after setting up new VM to enable reporting data with mine mechanism. This has to be executed as one of the first steps, so we can't use the "atd" scenario with delayed restart of salt-minion as a last step. When we execute highstate, it first fails (because minion was restarted) - so we have to run it twice on fresh machine.