I found salt minion 2014.7 very unstable, often not return。
I use 2014.7 commit id 3e2b366cd6f8c8bcb37eb55f59853178a0420072
#salt 'console1*' test.ping -v
Executing job with jid 20150130165611866138
-------------------------------------------
console1.xxx:
Minion did not return. [No response]
run command return null
2015-01-30 16:56:12,695 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:927 User root Executing command test.ping with jid 20150130165611866138
2015-01-30 16:56:12,696 [salt.minion ][DEBUG ] /usr/local/salt/packages/salt/minion.py:933 Command details {'tgt_type': 'glob', 'jid': '20150130165611866138', 'tgt': 'console1*', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
2015-01-30 16:56:12,707 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:1006 Starting a new job with PID 15879
2015-01-30 16:56:12,708 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:1174 Returning information for job: 20150130165611866138
2015-01-30 16:56:12,841 [salt.crypt ][DEBUG ] /usr/local/salt/packages/salt/crypt.py:399 Decrypting the current master AES key
2015-01-30 16:56:12,842 [salt.crypt ][DEBUG ] /usr/local/salt/packages/salt/crypt.py:324 Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-01-30 16:56:17,715 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:927 User root Executing command saltutil.find_job with jid 20150130165616879623
2015-01-30 16:56:17,716 [salt.minion ][DEBUG ] /usr/local/salt/packages/salt/minion.py:933 Command details {'tgt_type': 'glob', 'jid': '20150130165616879623', 'tgt': 'console1*', 'ret': '', 'user': 'root', 'arg': ['20150130165611866138'], 'fun': 'saltutil.find_job'}
2015-01-30 16:56:17,726 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:1006 Starting a new job with PID 15961
2015-01-30 16:56:17,730 [salt.minion ][INFO ] /usr/local/salt/packages/salt/minion.py:1174 Returning information for job: 20150130165616879623
2015-01-30 16:56:17,860 [salt.crypt ][DEBUG ] /usr/local/salt/packages/salt/crypt.py:399 Decrypting the current master AES key
2015-01-30 16:56:17,861 [salt.crypt ][DEBUG ] /usr/local/salt/packages/salt/crypt.py:324 Loaded minion key: /etc/salt/pki/minion/minion.pem
#less /var/log/salt/minion
2015-01-30 14:39:28,359 [salt.minion ][CRITICAL] /usr/local/salt/packages/salt/minion.py:1734 An exception occurred while polling the minion
Traceback (most recent call last):
File "/usr/local/salt/packages/salt/minion.py", line 1726, in tune_in_no_block
self._do_socket_recv(socks)
File "/usr/local/salt/packages/salt/minion.py", line 1760, in _do_socket_recv
self._handle_payload(payload)
File "/usr/local/salt/packages/salt/minion.py", line 866, in _handle_payload
payload['sig'] if 'sig' in payload else None)
File "/usr/local/salt/packages/salt/minion.py", line 897, in _handle_aes
data = self.crypticle.loads(load)
File "/usr/local/salt/packages/salt/crypt.py", line 791, in loads
data = self.decrypt(data)
File "/usr/local/salt/packages/salt/crypt.py", line 774, in decrypt
raise AuthenticationError('message authentication failed')
AuthenticationError: message authentication failed
Salt: 2014.7.0
Python: 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
Jinja2: 2.8-dev
M2Crypto: 0.21.1
msgpack-python: 0.4.0
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 13.0.2
RAET: Not Installed
ZMQ: 3.2.2
Mako: 0.9.0
#time salt '*' test.ping -v
Executing job with jid 20150130161206249724
-------------------------------------------
lvs.xxx:
True
cache.xxx:
True
web.xxx:
True
log.xxx:
True
real 0m0.666s
user 0m0.422s
sys 0m0.044s
Salt: 0.17.5
Python: 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
Jinja2: 2.8-dev
M2Crypto: 0.21.1
msgpack-python: 0.4.0
msgpack-pure: Not Installed
pycrypto: 2.6.1
PyYAML: 3.10
PyZMQ: 13.0.2
ZMQ: 3.2.2
I can confirm I am also seeing this behavior using the v2014.7.1 tag; I would also note the worse part about this is the minion 'acts' alive but is most def not responding thus I cannot do anything with it. This APPEARS to have happened after a saltutil.sync_modules
2015-01-30 22:54:38,872 [salt.minion][CRITICAL] An exception occurred while polling the minion
Traceback (most recent call last):
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1747, in tune_in_no_block
self._do_socket_recv(socks)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1781, in _do_socket_recv
self._handle_payload(payload)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 867, in _handle_payload
payload['sig'] if 'sig' in payload else None)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 898, in _handle_aes
data = self.crypticle.loads(load)
File "/opt/salt/lib/python2.6/site-packages/salt/crypt.py", line 796, in loads
data = self.decrypt(data)
File "/opt/salt/lib/python2.6/site-packages/salt/crypt.py", line 779, in decrypt
raise AuthenticationError('message authentication failed')
AuthenticationError: message authentication failed
Salt: 2014.7.1
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.7.3
M2Crypto: 0.22
msgpack-python: 0.4.4
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: Not Installed
PyYAML: 3.11
ioflo: Not Installed
PyZMQ: 14.5.0
RAET: Not Installed
ZMQ: 4.0.5
Mako: Not Installed
does restarting the minion do anything? or, forcing regenration of keys for that minion (rm /etc/salt/pki/minion/*, rm /etc/salt/pki/master/minions/
Yes restarting the minion has been the fix for me, here is what I see in the log after restarting the minion. The message authentication failed, then I restarted then the error about cannot deserialize msgpack
2015-01-31 00:29:47,760 [salt.minion [CRITICAL] An exception occurred while polling the minion
Traceback (most recent call last):
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1747, in tune_in_no_block
self._do_socket_recv(socks)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1781, in _do_socket_recv
self._handle_payload(payload)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 867, in _handle_payload
payload['sig'] if 'sig' in payload else None)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 898, in _handle_aes
data = self.crypticle.loads(load)
File "/opt/salt/lib/python2.6/site-packages/salt/crypt.py", line 796, in loads
data = self.decrypt(data)
File "/opt/salt/lib/python2.6/site-packages/salt/crypt.py", line 779, in decrypt
raise AuthenticationError('message authentication failed')
AuthenticationError: message authentication failed
2015-02-01 00:45:34,862 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
2015-02-01 00:45:34,862 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
2015-02-01 00:45:34,863 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
2015-02-01 00:45:36,211 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
2015-02-01 00:45:36,212 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
2015-02-01 00:45:36,212 [salt.payload][CRITICAL] Could not deserialize msgpack message: In an attempt to keep Salt running, returning an empty dict.This often happens when trying to read a file not in binary mode.Please open an issue and include the following error: Unpack failed: error = 0
I tried to restarting the minion, minion normal after the restart. But over time, continue to appear Minion did not return。
but salt-call command normal.
@pitatus @terminalmage @thatch45
[*******lvs2 ~]# salt-call state.sls reactor -l debug
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Guessing ID. The id can be explicitly in set /etc/salt/minion
[INFO ] Found minion id from getfqdn(): lvs2.***
[DEBUG ] loading log_handlers in ['/var/cache/salt/minion/extmods/log_handlers', '/usr/local/salt/packages/salt/log/handlers']
[DEBUG ] Skipping /var/cache/salt/minion/extmods/log_handlers, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/utils/parsers.py:171:parse_args Configuration file path: /etc/salt/minion
[DEBUG ] /usr/local/salt/packages/salt/config.py:427:_read_conf_file Reading configuration from /etc/salt/minion
[DEBUG ] /usr/local/salt/packages/salt/loader.py:561:gen_functions loading grain in ['/var/cache/salt/minion/extmods/grains', '/usr/local/salt/packages/salt/grains']
[DEBUG ] /usr/local/salt/packages/salt/loader.py:587:gen_functions Skipping /var/cache/salt/minion/extmods/grains, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/crypt.py:216:get_keys Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] /usr/local/salt/packages/salt/crypt.py:268:decrypt_aes Decrypting the current master AES key
[DEBUG ] /usr/local/salt/packages/salt/crypt.py:216:get_keys Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] /usr/local/salt/packages/salt/crypt.py:216:get_keys Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] /usr/local/salt/packages/salt/loader.py:561:gen_functions loading module in ['/opt/lib/cdn/py/salt_module', '/var/cache/salt/minion/extmods/modules', '/usr/local/salt/packages/salt/modules']
[DEBUG ] /usr/local/salt/packages/salt/loader.py:617:gen_functions Skipping .init, it does not end with an expected extension
[DEBUG ] /usr/local/salt/packages/salt/loader.py:587:gen_functions Skipping /var/cache/salt/minion/extmods/modules, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/loader.py:617:gen_functions Skipping cytest.pyx, it does not end with an expected extension
[DEBUG ] /usr/local/salt/packages/salt/loader.py:617:gen_functions Skipping .grains.py.swp, it does not end with an expected extension
[DEBUG ] /usr/local/salt/packages/salt/loader.py:617:gen_functions Skipping .cp.py.swp, it does not end with an expected extension
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded localemod as virtual locale
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded groupadd as virtual group
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded rh_service as virtual service
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded yumpkg as virtual pkg
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded parted as virtual partition
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded linux_sysctl as virtual sysctl
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded mdadm as virtual raid
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded linux_acl as virtual acl
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded sysmod as virtual sys
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded rpm as virtual lowpkg
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded useradd as virtual user
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded grub_legacy as virtual grub
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded rh_ip as virtual ip
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded cmdmod as virtual cmd
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded virtualenv_mod as virtual virtualenv
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded linux_lvm as virtual lvm
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded djangomod as virtual django
[DEBUG ] /usr/local/salt/packages/salt/loader.py:561:gen_functions loading returner in ['/var/cache/salt/minion/extmods/returners', '/usr/local/salt/packages/salt/returners']
[DEBUG ] /usr/local/salt/packages/salt/loader.py:587:gen_functions Skipping /var/cache/salt/minion/extmods/returners, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded couchdb_return as virtual couchdb
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded syslog_return as virtual syslog
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded carbon_return as virtual carbon
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded sqlite3_return as virtual sqlite3
[DEBUG ] /usr/local/salt/packages/salt/loader.py:561:gen_functions loading states in ['/var/cache/salt/minion/extmods/states', '/usr/local/salt/packages/salt/states']
[DEBUG ] /usr/local/salt/packages/salt/loader.py:587:gen_functions Skipping /var/cache/salt/minion/extmods/states, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded saltmod as virtual salt
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded mdadm as virtual raid
[DEBUG ] /usr/local/salt/packages/salt/loader.py:764:gen_functions Loaded virtualenv_mod as virtual virtualenv
[DEBUG ] /usr/local/salt/packages/salt/loader.py:561:gen_functions loading render in ['/var/cache/salt/minion/extmods/renderers', '/usr/local/salt/packages/salt/renderers']
[DEBUG ] /usr/local/salt/packages/salt/loader.py:587:gen_functions Skipping /var/cache/salt/minion/extmods/renderers, it is not a directory
[DEBUG ] /usr/local/salt/packages/salt/loader.py:617:gen_functions Skipping .py.py.swp, it does not end with an expected extension
[DEBUG ] /usr/local/salt/packages/salt/minion.py:166:parse_args_and_kwargs Parsed args: ['reactor']
[DEBUG ] /usr/local/salt/packages/salt/minion.py:167:parse_args_and_kwargs Parsed kwargs: {'__pub_fun': 'state.sls', '__pub_jid': '20150202153415445097', '__pub_pid': 30269, '__pub_tgt': 'salt-call'}
I tried to rollback tag:v2014.7.0 problem persists.
I have the same issue with 2014.7.0 and 2014.7.1.
Master and minions are located in different datacenters, if that should matter.
$ sudo salt -v '*log*' state.highstate
Executing job with jid 20150202160533182941
-------------------------------------------
yr-log-1:
Minion did not return. [No response]
There is no entry in the minion log.
Restarting salt-minion seems to reinstate contact between master and minion for a short period.
If the minion and master first have contact, keeping it alive with test.ping seems to work.
Running salt-call from the minion also works.
with the upgrading and downgrading that has been happening, can you (those that are affected) try removing keys and re-add (all minions), and also deleting the cache (but not while ANY jobs are active) and restart. specifically, shutdown master and minion, remove keys and cache, startup and re-accept keys. also check that both master and all minions are running the same version.. (any possible way when upgrading was done that a minion was upgraded first, before the master?)
I tried stopping master, deleting all keys and cache, restart all minions, start up master and then re-add keys.
Minions respond right away, but after about 5 minutes of inactivity, minions stop returning.
it's possible the router/switches between the datacenters are timing out the connection (busy switches), when that happens do you know if the minion still thinks it has a connection..
netstat -an | grep 4505
I suspect the TCP connection status will show ESTABLISHED
FYI.. I can simulate this with two boxes, master on one, minion on the other, establish connection, accept keys, etc.
pull the network cable on the minion and do a 'test.ping' from the master, the master will timeout waiting for the minion to respond and the network socket to the minion will be discarded. plug back in the minion's network cable, check 'netstat -an' on the minion, it shows an 'ESTABLISHED' connection to the master.. the minion (TCP stack) doesn't know the connection is invalid, and won't until it tries to use the socket..
In my case, the minion will eventually reconnect.. however if the network is busy enough that the routers/switches are removing idle connections then there may be another complicating component in the mix..
@pitatus Thanks for investigating. It could seem your right: (And I wonder why I didn't check connections myself ..)
$ netstat -an | grep '4506'
tcp 0 0 10.0.1.10:44578 salt-master:4506 ESTABLISHED
$ netstat -an | grep '4505'
tcp 0 0 10.0.1.10:54115 salt-master:4505 ESTABLISHED
On master, minion did not return.
I'm really not sure how to go about this. I guess I could set up a cron that runs salt-call on the minions to make sure the connection is open.
I've been running things in Azure (both master and minions) and had these problems as well. At first I put up a cron that pingeded every minute (Azure seemed to cap ovar 60s idle connections) and it worked at first. But when the number of minions grew this solution just spammed us down.
I've got a test setup running with one master and one minion now, and activated the keepalive settings with intervals less than 60s on the minion. This seems to work (2014.7.1 minion, master from dev branch).
So far this seems to work! However...efter a long idle period the first ping takes 10s, the rest takes 1s ...but thats another issue I think. The connection seems to be there!
I also added keepalive settings on one of my minions with intervals on 60s.
So far, it seems like its working for keeping minions responding.
Like @andrejohansson, I use Azure as a provider.
Hi :
@andrejohansson @pitatus
my configuration is as follows minion, On master, minion did not return.
minion, it shows an 'ESTABLISHED' connection to the master。
I think that this error does not occur at the network layer, I use my own private network.
[**** ~]# netstat -antp | grep '4506'
tcp 0 0 ****.194:41260 ****.157:4506 ESTABLISHED 12480/python2.6
tcp 0 0 ****.194:39668 ****.156:4506 ESTABLISHED 12480/python2.6
[**** ~]#
#more /etc/salt/minion
master:
- master1
- master2
pidfile: /var/run/salt-minion.pid
log_fmt_console: '[%(levelname)-8s] %(pathname)s:%(lineno)d:%(funcName)s %(message)s'
log_fmt_logfile: '%(asctime)s,%(msecs)03.0f [%(name)-17s][%(levelname)-8s] %(pathname)s:%(lineno)d %(message)s'
log_level_logfile: debug
module_dirs: ['/opt/lib/py/salt_module']
minion_id_caching: False
tcp_keepalive: True
tcp_keepalive_idle: 80
tcp_keepalive_cnt: 3
tcp_keepalive_intvl: 20
Hi :
Minion run a few days later, the problem will reappear.
i think this is a serious problem.
@pitatus @terminalmage @thatch45
@Jiaion what do you see from netstat for port 4505 on the master and minion when you are seeing this problem (does the master show an established connection with the minion experiencing the problem)? Also what are the minion config settings for recon_* ?
I can confirm the same issue after searching for it.
This is not a matter of the config settings used, as those are the default ones since there is no need to change re-connection or any specific key regen unless for troubleshooting purposes.
@pitatus
Yes, master and minion show an established connection with then minion experiencing the problem.
[root@ca** ~]# netstat -antp | awk '$5~/:(4505|4506)/ && $5~/23.(156|157)/'
tcp 0 0 ****.23.166:47568 ****.23.157:4506 ESTABLISHED 7773/python2.6
tcp 0 0 ****.23.166:47822 ****.23.156:4505 ESTABLISHED 7773/python2.6
tcp 0 0 ****.23.166:37487 ****.23.156:4506 ESTABLISHED 7773/python2.6
tcp 0 0 ****.23.166:39379 ****.23.157:4505 ESTABLISHED 7773/python2.6
[root@ca** ~]#
[root@console** /root]
#netstat -antp | awk '$4~/:(4505|4506)/ && $5~/166/'
tcp 0 0 0.0.0.0:4505 0.0.0.0:* LISTEN 107104/python2.6
tcp 0 0 0.0.0.0:4506 0.0.0.0:* LISTEN 107350/python2.6
tcp 0 0 ****.23.156:4506 ****.23.166:37487 ESTABLISHED 107350/python2.6
tcp 0 0 ****.23.156:4505 ****.23.166:47822 ESTABLISHED 107104/python2.6
I'm not able to reproduce so if there is any other data (logs, other evidence, etc) please add to this issue to help find a solution..
Hi
@pitatus
minion except line in 779 , Under what circumstances will result to zero it?
How long to run your minion?
I suspect that I configure multiple master lead, I am testing
763 def decrypt(self, data):
774 result = 0
775 for zipped_x, zipped_y in zip(mac_bytes, sig):
776 result |= ord(zipped_x) ^ ord(zipped_y)
777 if result != 0:
778 log.debug('Failed to authenticate message')
779 raise AuthenticationError('message authentication failed')
"/usr/local/salt/packages/salt/crypt.py" 846 lines --89%--
I'm having the same issue. My servers are located in different Datacenters of Hetzner hosting network.
All minions and the master are 2014.7.1
looks like after some period of inactivity, the first command to a minion timeouts, the subsequent command suceeds. For example: when I run 'salt * test.ping' on the master for the fisrst time - some minions do not return, next time I ping - more minions return, and third time all of them return.
I have logs at INFO level, don't see anything related to this problem.
I was trying to find optimal configuration with minion and master parameters in config files (reconnects, timouts, etc), but that didn't change anything.
The only thing that solves the problem is running this command in crontab every 5 min:
_/5 * * * * /usr/bin/salt '_' test.ping > /dev/null
Having this in crontab, makes minions allways return and not timout on commands.
Hi :
@pitatus This problem occurs in the case of multiple Master's.
/etc/salt/minion
master:
- master1
- master2
@pitatus ?
Not sure if this is directly relevant but I have a couple of (separate) masters with their own swarm of minions. Each of the masters are running Helium (2014.7.1) and the minions in each system are split between a mix of 2014.1.11, 2014.7.0, and 2014.7.1 and I am consistently seeing that all servers with anything 2014.7.x respond slower to the test module and are less often able to respond to my first test.ping. All 2014.1.11 servers respond and in a fairly timely manner.
A Salt-User has reported the following to SaltStack:
Effectively this kills our ability to use salt-api and the cloudify salt plugin or salt step in rundeck because its not reliable.
ZD-218
I'm using the salt-api (in a deploy script) despite the problem with "sleepy minions".
The test.ping crontab workaround (see above) works great :)
I'm also running into this.
Hey guys! I found a solution for this! Now moving to ansible :)
I also encountered this issue, it turns out at some point the salt master
had hung, so we ended up with two salt-master processes. Our minion
behavior was everything worked perfectly when doing salt-call from the
minion, but any time you tried to run a salt state from the master, it was
a crap-shoot whether it would time out or have "authentication failed" or
some other error. Once we killed the extra process and rebooted, we haven't
had an issue since. The bizarre thing was we actually performed an upgrade
of the salt-master during the time this hung process was around, and it
still didn't kill it, we had to manually kill it and then rebooted for good
measure.
On Sun, Apr 5, 2015 at 12:38 AM, Alexander Artemenko <
[email protected]> wrote:
Hey guys! I found a solution for this! Now moving to ansible :)
—
Reply to this email directly or view it on GitHub
https://github.com/saltstack/salt/issues/20240#issuecomment-89720356.
@svetlyak40wt good idea..
Just upgraded from 0.17.2 to 2014.7.2 and everything went crazy. We have autoscaling in AWS and all our systems depend on Salt to work properly. We often find that we cannot deploy software on different machines due to this connection problem. The work-around is to restart the minions, but it becomes very frustrating.
Guys, as I said, just get this in your crontab on salt-master:
_/5 * * * * /usr/bin/salt '_' test.ping > /dev/null
That should do it.
Actually, downgrade to 0.17.5 solved most of the issues. It looks far more stable then 2014.x.y.
The issue may be related to the problem with ZMQ described here http://lucumr.pocoo.org/2012/6/26/disconnects-are-good-for-you/
And that article is discussed here https://news.ycombinator.com/item?id=4161073
This might be fixed with the ret port connection keepalives that we added recently. we added connection persistence for those in 2014.7 and they can die sometimes. So that patch is in the latest 2014.7 branch (@cachedout did that make it into 2014.7.4?) and in 2015.2
Moving to RAET will probably fix the issue, right ?
Yes, it should fix it as well
@Jiaion - has this been fixed?
@ssgward
I have not tested the latest version
Just confirmed, Use of Multiple masters is causing this issue for me as well. (Lithium)
Seeing the same problem on 2015.5.2
Not particularly helpful at this stage I guess, but seeing same issue here.
Will try to nail down more debugging stuff tomorrow.
Have a similar issue
# salt --versions-report
Salt: 2015.5.2
Python: 2.7.3 (default, Dec 18 2014, 19:10:20)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.1.10
msgpack-pure: Not Installed
pycrypto: 2.4.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 13.0.0
RAET: Not Installed
ZMQ: 3.2.2
Mako: Not Installed
Minions sometimes return when executing commands with cmd.run :
Minion did not return. [No response]
Commands still execute but the output is empty.
i have had the same issue. Only it was dns related.
I my case the minion is called centos, but the name centos can not be resolved.
Sometimes it works and sometime not. Is it posible to lookup the minion by IP? Now the master is looking voor centos in DNS. after adding the minion 'centos' to the /etc/hosts of the master. i had no problems any more.
Same issue here.
We detect the problem time ago, but using a big timeout configuration in master solves the problem.
Now is ocurred in a initial highstate execution.
salt --versions
Salt: 2015.5.3
Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
Jinja2: 2.7.3
M2Crypto: 0.22
msgpack-python: 0.4.6
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: Not Installed
PyYAML: 3.11
ioflo: Not Installed
PyZMQ: 14.4.1
RAET: Not Installed
ZMQ: 4.0.5
Mako: 0.9.1
Tornado: Not Installed
Debian source package: 2015.5.3+ds-1trusty1
I have the same problem with the latest 2015.8.0
# salt --versions
Salt Version:
Salt: 2015.8.0
Dependency Versions:
Jinja2: 2.7.3
M2Crypto: Not Installed
Mako: Not Installed
PyYAML: 3.11
PyZMQ: 14.7.0
Python: 2.7.5 (default, Jun 24 2015, 00:41:19)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: centos 7.1.1503 Core
machine: x86_64
release: 3.10.0-229.14.1.el7.x86_64
system: CentOS Linux 7.1.1503 Core
Same issue:
$ salt --versions-report
Salt: 2014.7.6-134-gd284eb1
Python: 2.7.3 (default, Jun 22 2015, 19:33:41)
Jinja2: 2.6
M2Crypto: 0.21.1
msgpack-python: 0.1.10
msgpack-pure: Not Installed
pycrypto: 2.6.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.5.0
RAET: Not Installed
ZMQ: 4.0.5
Mako: Not Installed
same issue here. Usually restarting the salt-minion service fixes it temporarily, but it eventually comes back later.
manual commands such as
salt '*' cmd.run 'apt-get install vim'
work fine, but running pkg.installed in a state file for vim gets a no-response.
Salt Version:
Salt: 2015.8.0
Dependency Versions:
Jinja2: 2.8
M2Crypto: Not Installed
Mako: Not Installed
PyYAML: 3.10
PyZMQ: 14.7.0
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.2
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-58-generic
system: Ubuntu 14.04 trusty
Just comfirm the same issue here.
salt version: 2014.7.0
upset about setting multi master for failover, because it only works for serveral minutes while minion restart. After that all minions go down for being "Minion did not return".
I suspect it's related to the network delay between master and minions. Hope this proble will be solved.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
same issue.
Salt Version:
Salt: 2019.2.2
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.8.1
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.35.2
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: Not Installed
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.6.8 (default, Aug 7 2019, 17:28:10)
python-gnupg: Not Installed
PyYAML: 3.12
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.4.2
ZMQ: 4.1.4
System Versions:
dist: centos 7.6.1810 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-957.10.1.el7.x86_64
system: Linux
version: CentOS Linux 7.6.1810 Core
Most helpful comment
Hey guys! I found a solution for this! Now moving to ansible :)