This issue was something we've run into in the past (#38371) and was fixed in #38923.
On Linux, this works fine. The "workaround returner" I created and to used to workaround this in the past was no longer necessary.
However, we recently installed a few Windows machines with Salt and noticed that it was showing the very same behavior as with the previous issue:
salt-call state.highstate properly returns the salt/job/* event to the masterstate.highstate does not return the salt/job/* event to the mastersalt-run state.event pretty=Trueminion/refresh/* and salt/auth events are being returned but no salt/job/* events for scheduled highstatessalt-call state.highstate on the Windows minionSalt Version:
Salt: 2018.3.2
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.4.2
docker-py: Not Installed
gitdb: 0.6.4
gitpython: 1.0.1
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: 1.6.1
M2Crypto: Not Installed
Mako: 1.0.3
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.3.7
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.12 (default, Dec 4 2017, 14:50:18)
python-gnupg: 0.3.8
PyYAML: 3.11
PyZMQ: 15.2.0
RAET: Not Installed
smmap: 0.9.0
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4
System Versions:
dist: Ubuntu 16.04 xenial
locale: UTF-8
machine: x86_64
release: 4.4.0-1031-aws
system: Linux
version: Ubuntu 16.04 xenial
Salt Version:
Salt: 2018.3.2
Dependency Versions:
cffi: 1.10.0
cherrypy: 10.2.1
dateutil: 2.6.1
docker-py: Not Installed
gitdb: 2.0.3
gitpython: 2.1.3
ioflo: Not Installed
Jinja2: 2.9.6
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 1.0.6
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.17
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)]
python-gnupg: 0.4.1
PyYAML: 3.12
PyZMQ: 16.0.3
RAET: Not Installed
smmap: 2.0.3
timelib: 0.2.4
Tornado: 4.5.1
ZMQ: 4.1.6
System Versions:
dist:
locale: cp1252
machine: AMD64
release: 2016Server
system: Windows
version: 2016Server 10.0.14393 Multiprocessor Free
I am unable to replicate this.
I have assigned the following states.
[root@salt ~]# tail -c +0 /srv/salt/*.sls
==> /srv/salt/sched.sls <==
highstate:
schedule.present:
- function: state.apply
- seconds: 30
==> /srv/salt/test.sls <==
test highstate:
test.succeed_with_changes: []
==> /srv/salt/top.sls <==
base:
'*':
- test
I used the schedule.present to schedule the highstate function in the minion process. and I am seeing the job returns.
[root@salt ~]# salt-run state.event tagmatch='salt/job/*' pretty=true
salt/job/20180807152950409718/ret/salt.c.gtmanfred-1263.internal {
"_stamp": "2018-08-07T15:29:50.411386",
"arg": [],
"cmd": "_return",
"fun": "state.apply",
"fun_args": [],
"id": "salt.c.gtmanfred-1263.internal",
"jid": "20180807152950409718",
"out": "highstate",
"pid": 7212,
"retcode": 0,
"return": {
"test_|-test highstate_|-test highstate_|-succeed_with_changes": {
"__id__": "test highstate",
"__run_num__": 0,
"__sls__": "test",
"changes": {
"testing": {
"new": "Something pretended to change",
"old": "Unchanged"
}
},
"comment": "Success!",
"duration": 0.482,
"name": "test highstate",
"result": true,
"start_time": "15:29:50.036213"
}
},
"schedule": "highstate",
"success": true,
"tgt": "salt.c.gtmanfred-1263.internal",
"tgt_type": "glob"
}
salt/job/20180807152956772370/ret/windows.c.gtmanfred-1263.internal {
"_stamp": "2018-08-07T15:29:56.774412",
"arg": [],
"cmd": "_return",
"fun": "state.apply",
"fun_args": [],
"id": "windows.c.gtmanfred-1263.internal",
"jid": "20180807152956772370",
"out": "highstate",
"pid": 3556,
"retcode": 0,
"return": {
"test_|-test highstate_|-test highstate_|-succeed_with_changes": {
"__id__": "test highstate",
"__run_num__": 0,
"__sls__": "test",
"changes": {
"testing": {
"new": "Something pretended to change",
"old": "Unchanged"
}
},
"comment": "Success!",
"duration": 0.0,
"name": "test highstate",
"result": true,
"start_time": "15:29:56.094000"
}
},
"schedule": "highstate",
"success": true,
"tgt": "windows.c.gtmanfred-1263.internal",
"tgt_type": "glob"
}
salt/job/20180807153019489257/ret/salt.c.gtmanfred-1263.internal {
"_stamp": "2018-08-07T15:30:19.491089",
"arg": [],
"cmd": "_return",
"fun": "state.apply",
"fun_args": [],
"id": "salt.c.gtmanfred-1263.internal",
"jid": "20180807153019489257",
"out": "highstate",
"pid": 7341,
"retcode": 0,
"return": {
"test_|-test highstate_|-test highstate_|-succeed_with_changes": {
"__id__": "test highstate",
"__run_num__": 0,
"__sls__": "test",
"changes": {
"testing": {
"new": "Something pretended to change",
"old": "Unchanged"
}
},
"comment": "Success!",
"duration": 0.568,
"name": "test highstate",
"result": true,
"start_time": "15:30:19.083520"
}
},
"schedule": "highstate",
"success": true,
"tgt": "salt.c.gtmanfred-1263.internal",
"tgt_type": "glob"
}
salt/job/20180807153026754965/ret/windows.c.gtmanfred-1263.internal {
"_stamp": "2018-08-07T15:30:26.756872",
"arg": [],
"cmd": "_return",
"fun": "state.apply",
"fun_args": [],
"id": "windows.c.gtmanfred-1263.internal",
"jid": "20180807153026754965",
"out": "highstate",
"pid": 1336,
"retcode": 0,
"return": {
"test_|-test highstate_|-test highstate_|-succeed_with_changes": {
"__id__": "test highstate",
"__run_num__": 0,
"__sls__": "test",
"changes": {
"testing": {
"new": "Something pretended to change",
"old": "Unchanged"
}
},
"comment": "Success!",
"duration": 0.0,
"name": "test highstate",
"result": true,
"start_time": "15:30:26.094000"
}
},
"schedule": "highstate",
"success": true,
"tgt": "windows.c.gtmanfred-1263.internal",
"tgt_type": "glob"
}
Can you provide me with more information to replicate this?
Thanks,
Daniel
Very odd. I have just tested it again by adding a schedule:
salt 'web01.local' schedule.add test_highstate function='state.apply' seconds=30
Which - according to schedule.list gets added like:
test_highstate:
enabled: true
function: state.apply
jid_include: true
maxrunning: 1
name: test_highstate
seconds: 30
I do not see any highstate returns using your state.event parameters. I however do see (Skipped event tags) salt/auth and minion/refresh/ID coming past every 30 seconds.
The minion logs do show that a highstate is being executed, it's just not being returned.
What are the differences between your and my setup, a different Salt/Python/Windows version/edition by any chance?
@syphernl We are also seeing this in 2018.3.3. Were you able to find any other information?
There are no return events from Windows minions (we execute highstate every 3 hours via the scheduler). Events are being returned from the Linux minions.
The being said, highstate is being executed correctly.
What's kind of odd is that highstate emails from the highstate_email.highstate are occuring, even when the event isn't emitted.
schedule:
highstate:
function: state.highstate
minutes: 180
splay: 600
returner: highstate
return_config: highstate_email
highstate_email.highstate:
report_failures: True
report_changes: True
report_everything: False
report_format: html
report_delivery: smtp
@saltstack/team-triage can yall take a look at verifying this?
Looks like someone else is experiencing the same problem.
Thanks,
Daniel
So out of three deployments I have access to, only one of them has this problem with the Window's minions. The thing is, all of these deployments are practically identical - minions are installed the same way and the master is the same Docker image. The only difference that I could think of is that the problematic deployment is the oldest (it was upgraded from 2017 versions of SaltStack).
This is reproducible in newly installed Windows minions.
Are there any logs I could look though?
Looks like someone else had this issue from December of 2017 - #44779.
Ah, @gtmanfred found an unfortunate workaround...
On a hunch I disabled multiprocessing on a Windows minion:
multiprocessing: false
Using this to verify (originally no function was returned to the master):
salt windows schedule.add test_ping function='test.ping' seconds=30
And now scheduled functions are returning to the master. This doesn't appear to be a good fix as I believe there are several outstanding issues with Windows with multiprocessing disabled.
Log of good minion (Linux):
2019-02-02 19:02:27,212 [salt.utils.event :739 ][DEBUG ][126612] Sending event: tag = __schedule_return; data = {u'fun_args': [], u'jid': u'req', u'return': True, u'retcode': 0, u'success': True, u'schedule': u'test_ping', u'cmd': u'_return', u'pid': 126612, u'_stamp': '2019-02-03T01:02:27.211963', u'fun': u'test.ping', u'id': u'linux'}
2019-02-02 19:02:27,212 [salt.utils.schedule:782 ][DEBUG ][126612] schedule.handle_func: Removing /var/cache/salt/minion/proc/20190202190227188208
2019-02-02 19:02:27,213 [salt.minion :2261][DEBUG ][124488] Minion of 'salt.service' is handling event tag '__schedule_return'
2019-02-02 19:02:27,214 [salt.minion :1863][INFO ][124488] Returning information for job: req
Log of bad minion (Windows):
2019-02-02 18:49:38,221 [salt.utils.event :739 ][DEBUG ][2780] Sending event: tag = __schedule_return; data = {u'fun_args': [], u'jid': u'req', u'return': True, u'retcode': 0, u'success': True, u'schedule': u'test_ping', u'cmd': u'_return', u'pid': 2780, u'_stamp': '2019-02-03T00:49:38.222000', u'fun': u'test.ping', u'id': u'windows'}
2019-02-02 18:49:38,221 [salt.utils.schedule:782 ][DEBUG ][2780] schedule.handle_func: Removing c:\salt\var\cache\salt\minion\proc\20190202184938112000
Log of a bad minion with multiprocessing disabled (Windows):
2019-02-02 19:20:24,964 [salt.utils.event :739 ][DEBUG ][2984] Sending event: tag = __schedule_return; data = {u'fun_args': [], u'jid': u'req', u'return': True, u'retcode': 0, u'success': True, u'schedule': u'test_ping', u'cmd': u'_return', u'pid': 2984, u'_stamp': '2019-02-03T01:20:24.965000', u'fun': u'test.ping', u'id': u'windows'}
2019-02-02 19:20:24,964 [salt.utils.schedule:782 ][DEBUG ][2984] schedule.handle_func: Removing c:\salt\var\cache\salt\minion\proc\20190202192024808000
2019-02-02 19:20:24,964 [salt.minion :2261][DEBUG ][2984] Minion of 'salt.service' is handling event tag '__schedule_return'
2019-02-02 19:20:24,964 [salt.minion :1863][INFO ][2984] Returning information for job: req
So it looks like the event isn't being handed at all by the minion:
https://github.com/saltstack/salt/blob/4e1788e57b475102aa1e753d20a37a1ca3ad6076/salt/minion.py#L2638-L2672
Unfortunately, I no longer work for SaltStack, so someone from @saltstack/team-triage will need to take over managing this issue.
Multiprocessing makes things a little slower in Windows. I've had to increase the timeout when working with windows minions. I think the default is something like 5 seconds. I set it something like 30 for my Windows boxes. Even longer (300) if I'm working on something that I know takes a while to run... LGPO for example.
From the command line you can add -t 30 to the salt command.
There's also a config option for it that is just timeout. If you set it there you wouldn't have to do it with every CLI command.
Is there still a need for multiprocessing = true under Windows @twangboy (basically is multiprocessing = false a reliable workaround)?
multiprocessing=True is the default for Windows. There are problems with using multiprocessing=False. It's been years, so I can't remember what the problems are. I think there's an issue with how jobs are handled/cleared.
This is what needs to be fixed https://github.com/saltstack/salt/issues/48882#issuecomment-441038972
i.e. find_job needs to be an internal function which does not start a new thread. I had a go at it, but could not work out how to stop it creating a new thread. Its a simple change which would make a large difference. i.e. you would not need to use the -t option.
It would be nice if someone in @saltstack/team-core could implement find_job within minion.py without starting a thread it would save 10 seconds and make a heap of difference to windows.
@damon-atkins traveling at the moment but once I'm home I can look at this one.
@garethgreenaway Is this issue somewhere on your radar?
I would be happy to provide more information with current versions of SaltStack on Windows 2016/2019. I do believe it's still happening.
I'm testing saltstack - and unfortunately it's performance on windows is much worse than on linux :( grains_cache=True helped a little bit (simple test.ping went from 15s on my workstation vagrant setup to 7s), but it's still slow - on linux I've got response after 2s. And we're talking about test.ping! That's quite a problem for saltstack evaluation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Up.
Thank you for updating this issue. It is no longer marked as stale.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
.
Thank you for updating this issue. It is no longer marked as stale.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
reopen
Thank you for updating this issue. It is no longer marked as stale.
kicking this back into triage to get movement on it @DmitryKuzmenko can you please re-triage this issue this week?
I suspect this is a duplicate or part of the problem. #51621 Master calling find_job every 10 seconds, Windows takes 10 seconds to execute it depending on how schedule works.
Most helpful comment
Up.