On specific VMs I am getting the below error message.
][CRITICAL][21632] Unexpected error while connecting to saltsyndic04.domain.com
Traceback (most recent call last):
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 875, in _connect_minion
yield minion.connect_master(failed=failed)
File "/opt/salt/lib64/python2.6/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/opt/salt/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt/lib64/python2.6/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1053, in connect_master
yield self._post_master_init(master)
File "/opt/salt/lib64/python2.6/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/opt/salt/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt/lib64/python2.6/site-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "/opt/salt/lib/python2.6/site-packages/salt/minion.py", line 1128, in _post_master_init
}, persist=True)
File "/opt/salt/lib/python2.6/site-packages/salt/utils/schedule.py", line 577, in add_job
tag='/salt/minion/minion_schedule_add_complete')
File "/opt/salt/lib/python2.6/site-packages/salt/utils/event.py", line 752, in fire_event
self.io_loop.run_sync(lambda: self.pusher.send(msg))
File "/opt/salt/lib64/python2.6/site-packages/tornado/ioloop.py", line 444, in run_sync
return future_cell[0].result()
File "/opt/salt/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt/lib64/python2.6/site-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "/opt/salt/lib/python2.6/site-packages/salt/transport/ipc.py", line 413, in send
yield self.stream.write(pack)
File "/opt/salt/lib64/python2.6/site-packages/tornado/iostream.py", line 367, in write
self._check_closed()
File "/opt/salt/lib64/python2.6/site-packages/tornado/iostream.py", line 877, in _check_closed
raise StreamClosedError("Stream is closed")
StreamClosedError: Stream is closed
Looking at the code it made sense to me to add retries to IPCMessageClient since they are not implemented.
https://github.com/saltstack/salt/blob/develop/salt/transport/ipc.py#L414
I would like to suggest the following
@tornado.gen.coroutine
def send(self, msg, timeout=None, tries=3):
'''
Send a message to an IPC socket
If the socket is not currently connected, a connection will be established.
:param dict msg: The message to be sent
:param int timeout: Timeout when sending message (Currently unimplemented)
'''
if not self.connected():
yield self.connect()
while tries > 0:
try:
pack = salt.transport.frame.frame_msg_ipc(msg, raw_body=True)
yield self.stream.write(pack)
except tornado.iostream.StreamClosedError as exc:
log.error('Could not connect to ips stream with error {0}.'.format(exc))
yield self.connect()
tries -= 1
@DmitryKuzmenko , @skizunov do you think this is the proper solution ?
Any suggestions ?
Cannot reproduce except for the existing vms.
Not sure why it only happens there. Any troubleshooting steps would be welcome.
2016.11.5
It is possible that the following scenario is happening:
Do you see this exception when something is shutting down? If so, perhaps a better solution is to absorb that particular exception and write a warning or debug message to the log telling you that the other side went down.
I failed to mention that when minion starts if goes to a loop and always shows this exception and never succeeds having a proper connection to the master but receives its public key.
In my setup I have multiple masters with failover + retries but even if I remove the failover I get the same exception.
From the exception but also from logging self.stream.write(pack) I can see that this is caused by the scheduled tasks trying to send an event to the socket.
After catching the exception and retrying two times the connection succeeds and the minions connects to the master.
Any idea why there is a write failure? The connection portion must succeed before it tries to write, so the other side must be available. It doesn't make sense that the connection would close before the write, unless the other side malfunctioned somehow. We probably should find out and address the underlying problem.
For all actions the minion seems to be connected until it reaches
https://github.com/saltstack/salt/blob/develop/salt/minion.py#L1155
I don't have master_failback enabled so it tries to delete it.
https://github.com/saltstack/salt/blob/develop/salt/minion.py#L1169
After this event the connections breaks and I get the exception.
@DmitryKuzmenko can you help please ?
Does this happen when running code in the develop branch, or are you running a released version but just referencing develop branch code?
Nevermind, I just noticed:
Versions Report: 2016.11.5
One thing that I noticed that is inconsistent with your call stack and your description of the problem is that your description references self.schedule.delete_job but the call stack instead has self.schedule.add_job. It looks like the add_job is from here:
https://github.com/saltstack/salt/blob/2016.11/salt/minion.py#L1128
Which version of tornado are you using? I know that 2016.11 wasn't tested well with tornado 4.5.x, and surprisingly, there are subtle behavior changes in the IPC/TCP transports between versions of tornado.
@tsaridas sorry I've been on vacation.
Something happens with the minion event bus. Because in your stack trace schedule tries to send an event to the bus by IPC socket.
Could you please reproduce the issue with debug log level set in the minion config and attach the log? Don't forget to check and cut any sensitive data from there.
Thank you.
hi @DmitryKuzmenko ,
I'll try to take care of the debug log next week since I'm myself on vacation too.
Thanks for your response.
2017-09-18 10:00:25,232 [salt.cli.daemons ][INFO ][14396] Setting up the Salt Minion "minionhostname.domain.net"
2017-09-18 10:00:25,300 [salt.log.setup ][DEBUG ][14396] Stopping the multiprocessing logging queue listener
2017-09-18 10:00:25,801 [salt.log.setup ][DEBUG ][14396] closing multiprocessing queue
2017-09-18 10:00:25,802 [salt.log.setup ][DEBUG ][14396] joining multiprocessing queue thread
2017-09-18 10:00:25,802 [salt.log.setup ][DEBUG ][14396] Stopped the multiprocessing logging queue listener
2017-09-18 10:00:25,809 [salt.utils.process ][DEBUG ][14403] Created pidfile: /var/run/salt-minion.pid
2017-09-18 10:00:25,810 [salt.cli.daemons ][INFO ][14403] Starting up the Salt Minion
2017-09-18 10:00:25,811 [salt.utils.event ][DEBUG ][14403] AsyncEventPublisher PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:25,811 [salt.utils.event ][DEBUG ][14403] AsyncEventPublisher PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:25,811 [salt.utils.event ][INFO ][14403] Starting pull socket on /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:25,811 [salt.transport.ipc ][TRACE ][14403] IPCMessagePublisher: binding to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:25,812 [salt.transport.ipc ][TRACE ][14403] IPCServer: binding to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:25,812 [salt.utils.event ][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:25,812 [salt.utils.event ][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:25,812 [salt.transport.ipc ][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:25,813 [salt.transport.ipc ][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:25,814 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion
2017-09-18 10:00:25,815 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/_schedule.conf'
2017-09-18 10:00:25,815 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/_schedule.conf
2017-09-18 10:00:25,820 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/common_settings.conf'
2017-09-18 10:00:25,820 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/common_settings.conf
2017-09-18 10:00:25,823 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/masters.conf'
2017-09-18 10:00:25,823 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/masters.conf
2017-09-18 10:00:25,824 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/system.conf'
2017-09-18 10:00:25,824 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/system.conf
2017-09-18 10:00:25,834 [salt.loader ][TRACE ][14403] Loading core.hwaddr_interfaces grain
...
2017-09-18 10:00:26,598 [salt.utils.process ][DEBUG ][14403] Process Manager starting!
2017-09-18 10:00:26,598 [salt.utils.process ][TRACE ][14403] Process manager iteration
2017-09-18 10:00:26,614 [salt.minion ][INFO ][14403] Got list of available master addresses: ['saltsyndic01.domain.net', 'saltsyndic02.domain.net',
'saltsyndic03.domain.net', 'saltsyndic04.domain.net', 'saltsyndic05.domain.net', 'saltsyndic06.domain.net']
2017-09-18 10:00:26,636 [salt.minion ][DEBUG ][14403] Connecting to master. Attempt 1 (infinite attempts)
2017-09-18 10:00:26,651 [salt.crypt ][DEBUG ][14403] Initializing new AsyncAuth for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://16
1.202.186.160:4506')
2017-09-18 10:00:26,654 [salt.transport.zeromq ][DEBUG ][14403] Generated random reconnect delay between '1000ms' and '11000ms' (6250)
2017-09-18 10:00:26,654 [salt.transport.zeromq ][DEBUG ][14403] Setting zmq_reconnect_ivl to '6250ms'
2017-09-18 10:00:26,654 [salt.transport.zeromq ][DEBUG ][14403] Setting zmq_reconnect_ivl_max to '11000ms'
2017-09-18 10:00:26,654 [salt.transport.ipc ][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:26,655 [salt.transport.zeromq ][DEBUG ][14403] Initializing new AsyncZeroMQReqChannel for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net
', 'tcp://192.168.0.160:4506', 'clear')
2017-09-18 10:00:26,655 [salt.transport.zeromq ][TRACE ][14403] Inserted key into loop_instance_map id 51859824 for key ('/etc/salt-minion/pki/minion', 'minionhostname.', 'tcp://192.168.0.160:4506', 'clear') and process 14403
2017-09-18 10:00:26,687 [salt.crypt ][DEBUG ][14403] salt.crypt.verify_signature: Loading public key
2017-09-18 10:00:26,687 [salt.crypt ][DEBUG ][14403] salt.crypt.verify_signature: Verifying signature
2017-09-18 10:00:26,689 [salt.crypt ][DEBUG ][14403] Successfully verified signature of master public key with verification public key master_sign.pub
2017-09-18 10:00:26,689 [salt.crypt ][INFO ][14403] Received signed and verified master pubkey from master saltsyndic06.domain.net
2017-09-18 10:00:26,689 [salt.crypt ][DEBUG ][14403] Decrypting the current master AES key
2017-09-18 10:00:26,691 [salt.crypt ][DEBUG ][14403] Loaded minion key: /etc/salt-minion/pki/minion/minion.pem
2017-09-18 10:00:26,749 [salt.utils.event ][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:26,749 [salt.utils.event ][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:26,749 [salt.transport.ipc ][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:26,749 [salt.transport.ipc ][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:26,750 [salt.utils.event ][DEBUG ][14403] Sending event: tag = salt/auth/creds; data = {'_stamp': '2017-09-18T08:00:26.750276', 'creds': {'publ
ish_port': 4505, 'aes': 'c3Al095W3gvJxaW36Ujow20jNyn8EPUlT6b3lNpIIGtpoZZXC2OtF9PgJnjcm1r9giT3axCjV/k=', 'master_uri': 'tcp://192.168.0.160:4506'}, 'key': ('/etc/salt-minion/pki/minion', 'minionhostname
.net', 'tcp://192.168.0.160:4506')}
2017-09-18 10:00:26,751 [salt.transport.ipc ][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:26,751 [salt.transport.ipc ][TRACE ][14403] Client disconnected from IPC /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:26,753 [salt.crypt ][DEBUG ][14403] Loaded minion key: /etc/salt-minion/pki/minion/minion.pem
2017-09-18 10:00:26,757 [salt.transport.zeromq ][DEBUG ][14403] Initializing new AsyncZeroMQReqChannel for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net
', 'tcp://192.168.0.160:4506', 'aes')
2017-09-18 10:00:26,757 [salt.crypt ][DEBUG ][14403] Re-using AsyncAuth for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://161.202.18
6.160:4506')
2017-09-18 10:00:26,757 [salt.transport.zeromq ][TRACE ][14403] Inserted key into loop_instance_map id 51859824 for key ('/etc/salt-minion/pki/minion', 'minionhostname', 'tcp://192.168.0.160:4506', 'aes') and process 14403
2017-09-18 10:00:26,821 [salt.crypt ][DEBUG ][14403] Loaded minion key: /etc/salt-minion/pki/minion/minion.pem
2017-09-18 10:00:26,876 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion
2017-09-18 10:00:26,877 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/_schedule.conf'
2017-09-18 10:00:26,877 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/_schedule.conf
2017-09-18 10:00:26,882 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/common_settings.conf'
2017-09-18 10:00:26,882 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/common_settings.conf
2017-09-18 10:00:26,885 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/masters.conf'
2017-09-18 10:00:26,885 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/masters.conf
2017-09-18 10:00:26,886 [salt.config ][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/system.conf'
2017-09-18 10:00:26,886 [salt.config ][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/system.conf
2017-09-18 10:00:26,891 [salt.loader ][TRACE ][14403] Loading core.hwaddr_interfaces grain
...
2017-09-18 10:00:27,024 [salt.loaded.int.grains.disks ][TRACE ][14403] Device dm-4 reports itself as an HDD
2017-09-18 10:00:27,024 [salt.loaded.int.grains.disks ][TRACE ][14403] Device dm-5 reports itself as an HDD
2017-09-18 10:00:27,662 [salt.utils.schedule ][DEBUG ][14403] Initializing new Schedule
2017-09-18 10:00:27,663 [salt.utils.lazy ][DEBUG ][14403] LazyLoaded timezone.get_offset
2017-09-18 10:00:27,664 [salt.loader
][TRACE ][14403] Loaded cmdmod as virtual cmd
2017-09-18 10:00:27,664 [salt.utils.lazy
][DEBUG ][14403] LazyLoaded cmd.run
2017-09-18 10:00:27,665 [salt.loader.['saltsyndic01.domain.net', 'saltsyndic02.domain.net', 'saltsyndic03.domain.net', 'saltsyndic04.domain.net', 'saltsyndic05.domain.net', 'saltsyndic06'].int.module.cmdmod ][INFO ][14403] Executing command ['date', '+%z'] in directory '/root'
2017-09-18 10:00:27,668 [salt.loader.['saltsyndic01.domain.net', 'saltsyndic02.domain.net', 'saltsyndic03.domain.net', 'saltsyndic04.domain.net', 'saltsyndic05.domain.net', 'saltsyndic06'].int.module.cmdmod ][DEBUG ][14403] output: +0200
2017-09-18 10:00:27,673 [salt.utils.lazy
][DEBUG ][14403] LazyLoaded config.merge
2017-09-18 10:00:27,674 [salt.utils.event
][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:27,674 [salt.utils.event
][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,674 [salt.transport.ipc
][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,675 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,675 [salt.utils.event
][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_delete_complete; data = {'_stamp': '2017-09-18T08:00:27.675705', 'complete': True, 'schedule': {'__mine_interval':
{'function': 'mine.update', 'return_job': False, 'enabled': True, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}}}
2017-09-18 10:00:27,676 [salt.utils.schedule
][DEBUG ][14403] Persisting schedule
2017-09-18 10:00:27,678 [salt.utils.lazy
][DEBUG ][14403] LazyLoaded mine.update
2017-09-18 10:00:27,678 [salt.utils.schedule
][INFO ][14403] Updating job settings for scheduled job: __mine_interval
2017-09-18 10:00:27,678 [salt.utils.event
][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:27,678 [salt.utils.event
][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,679 [salt.transport.ipc
][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,679 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,679 [salt.utils.event
][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_add_complete; data = {'_stamp': '2017-09-18T08:00:27.679491', 'complete': True, 'schedule': {'__mine_interval': {'
function': 'mine.update', 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}}}
2017-09-18 10:00:27,679 [salt.utils.schedule
][DEBUG ][14403] Persisting schedule
2017-09-18 10:00:27,680 [salt.minion][INFO ][14403] Added mine.update to scheduler
2017-09-18 10:00:27,680 [salt.utils.schedule][INFO ][14403] Added new job __master_alive_saltsyndic06.domain.net to scheduler
2017-09-18 10:00:27,681 [salt.utils.event][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:27,681 [salt.utils.event][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,681 [salt.transport.ipc][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,681 [salt.transport.ipc][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,681 [salt.utils.event][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_add_complete; data = {'_stamp': '2017-09-18T08:00:27.681789', 'complete': True, 'schedule': {'__mine_interval': {'
function': 'mine.update', 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}, '__master_alive_saltsyndic06.domain.net': {'function': 'status.master', 'seconds': 300, '
enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 1, 'kwargs': {'connected': True, 'master': 'saltsyndic06.domain.net'}}}}
2017-09-18 10:00:27,682 [salt.transport.ipc][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,684 [salt.utils.event
][DEBUG ][14403] Stream is closed
2017-09-18 10:00:27,685 [salt.transport.ipc
][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:27,686 [salt.transport.ipc
][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:27,686 [salt.transport.ipc
][TRACE ][14403] Client disconnected from IPC /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,686 [salt.transport.ipc
][TRACE ][14403] Client disconnected from IPC /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:27,687 [salt.minion
][CRITICAL][14403] Unexpected error while connecting to saltsyndic06.domain.net
Traceback (most recent call last):
File "/opt/salt-minion/lib/python2.6/site-packages/salt/minion.py", line 875, in _connect_minion
yield minion.connect_master(failed=failed)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/opt/salt-minion/lib/python2.6/site-packages/salt/minion.py", line 1053, in connect_master
yield self._post_master_init(master)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "/opt/salt-minion/lib/python2.6/site-packages/salt/minion.py", line 1129, in _post_master_init
}, persist=True)
File "/opt/salt-minion/lib/python2.6/site-packages/salt/utils/schedule.py", line 577, in add_job
tag='/salt/minion/minion_schedule_add_complete')
File "/opt/salt-minion/lib/python2.6/site-packages/salt/utils/event.py", line 752, in fire_event
self.io_loop.run_sync(lambda: self.pusher.send(msg))
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/ioloop.py", line 444, in run_sync
return future_cell[0].result()
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "/opt/salt-minion/lib/python2.6/site-packages/salt/transport/ipc.py", line 416, in send
yield self.stream.write(pack)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/iostream.py", line 367, in write
self._check_closed()
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/iostream.py", line 877, in _check_closed
raise StreamClosedError("Stream is closed")
StreamClosedError: Stream is closed
2017-09-18 10:00:27,687 [salt.minion][INFO ][14403] Moving possibly failed master saltsyndic06.domain.net to the end of the list of masters
2017-09-18 10:00:27,689 [salt.minion][WARNING ][14403] Master ip address changed from 192.168.0.160 to 192.168.0.159
2017-09-18 10:00:27,691 [salt.minion
][WARNING ][14403] Master ip address changed from 192.168.0.160 to 169.56.128.11
2017-09-18 10:00:27,692 [salt.minion
][WARNING ][14403] Master ip address changed from 192.168.0.160 to 169.56.128.13
2017-09-18 10:00:27,693 [salt.minion
][WARNING ][14403] Master ip address changed from 192.168.0.160 to 192.168.0.158
2017-09-18 10:00:27,695 [salt.minion
][WARNING ][14403] Master ip address changed from 192.168.0.160 to 169.56.128.12
2017-09-18 10:00:27,695 [salt.minion
][DEBUG ][14403] Connecting to master. Attempt 1 (infinite attempts)
2017-09-18 10:00:27,696 [salt.crypt
][DEBUG ][14403] Re-using AsyncAuth for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://192.168.0.160:4506')
2017-09-18 10:00:27,697 [salt.transport.zeromq
][DEBUG ][14403] Generated random reconnect delay between '1000ms' and '11000ms' (4446)
2017-09-18 10:00:27,697 [salt.transport.zeromq
][DEBUG ][14403] Setting zmq_reconnect_ivl to '4446ms'
2017-09-18 10:00:27,697 [salt.transport.zeromq
][DEBUG ][14403] Setting zmq_reconnect_ivl_max to '11000ms'
2017-09-18 10:00:27,699 [salt.crypt
][DEBUG ][14403] Loaded minion key: /etc/salt-minion/pki/minion/minion.pem
2017-09-18 10:00:27,703 [salt.transport.zeromq
][DEBUG ][14403] Initializing new AsyncZeroMQReqChannel for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://192.168.0.160:4506', 'aes')
2017-09-18 10:00:27,703 [salt.crypt
][DEBUG ][14403] Re-using AsyncAuth for ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://192.168.0.160:4506')
2017-09-18 10:00:27,703 [salt.transport.zeromq
][TRACE ][14403] Inserted key into loop_instance_map id 51859824 for key ('/etc/salt-minion/pki/minion', 'minionhostname.domain.net', 'tcp://192.168.0.160:4506', 'aes') and proces
s 14403
2017-09-18 10:00:27,751 [salt.crypt
][DEBUG ][14403] Loaded minion key: /etc/salt-minion/pki/minion/minion.pem
2017-09-18 10:00:27,807 [salt.config
][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion
2017-09-18 10:00:27,807 [salt.config
][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/_schedule.conf'
2017-09-18 10:00:27,807 [salt.config
][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/_schedule.conf
2017-09-18 10:00:27,809 [salt.config
][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/common_settings.conf'
2017-09-18 10:00:27,809 [salt.config
][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/common_settings.conf
2017-09-18 10:00:27,812 [salt.config
][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/masters.conf'
2017-09-18 10:00:27,812 [salt.config
][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/masters.conf
2017-09-18 10:00:27,813 [salt.config
][DEBUG ][14403] Including configuration from '/etc/salt-minion/minion.d/system.conf'
2017-09-18 10:00:27,813 [salt.config
][DEBUG ][14403] Reading configuration from /etc/salt-minion/minion.d/system.conf
2017-09-18 10:00:27,819 [salt.loader
][TRACE ][14403] Loading core.hwaddr_interfaces grain
2017-09-18 10:00:27,826 [salt.loader
][TRACE ][14403] Loading core.hostname grain
2017-09-18 10:00:27,826 [salt.loader
][TRACE ][14403] Loading core.get_master grain
...
][TRACE ][14403] Device dm-5 reports itself as an HDD
2017-09-18 10:00:28,619 [salt.utils.schedule
][DEBUG ][14403] Re-using Schedule
2017-09-18 10:00:28,621 [salt.utils.lazy
][DEBUG ][14403] LazyLoaded mine.update
2017-09-18 10:00:28,621 [salt.utils.schedule
][INFO ][14403] Updating job settings for scheduled job: __mine_interval
2017-09-18 10:00:28,621 [salt.utils.event
][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:28,621 [salt.utils.event
][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,622 [salt.transport.ipc
][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,622 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,623 [salt.utils.event
][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_add_complete; data = {'_stamp': '2017-09-18T08:00:28.623026', 'complete': True, 'schedule': {'__mine_interval': {'
function': 'mine.update', 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}, '__master_alive_saltsyndic06.domain.net': {'function': 'status.master', 'seconds': 300, '
enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 1, 'kwargs': {'connected': True, 'master': 'saltsyndic06.domain.net'}}}}
2017-09-18 10:00:28,623 [salt.utils.schedule
][DEBUG ][14403] Persisting schedule
2017-09-18 10:00:28,624 [salt.minion
][INFO ][14403] Added mine.update to scheduler
2017-09-18 10:00:28,624 [salt.utils.schedule
][INFO ][14403] Updating job settings for scheduled job: __master_alive_saltsyndic06.domain.net
2017-09-18 10:00:28,624 [salt.utils.event
][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:28,625 [salt.utils.event
][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,625 [salt.transport.ipc
][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,625 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,625 [salt.utils.event
][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_add_complete; data = {'_stamp': '2017-09-18T08:00:28.625665', 'complete': True, 'schedule': {'__mine_interval': {'
function': 'mine.update', 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}, '__master_alive_saltsyndic06.domain.net': {'function': 'status.master', 'seconds': 300, '
enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 1, 'kwargs': {'connected': True, 'master': 'saltsyndic06.domain.net'}}}}
2017-09-18 10:00:28,626 [salt.utils.schedule
][DEBUG ][14403] Persisting schedule
2017-09-18 10:00:28,627 [salt.utils.event
][DEBUG ][14403] SaltEvent PUB socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pub.ipc
2017-09-18 10:00:28,627 [salt.utils.event
][DEBUG ][14403] SaltEvent PULL socket URI: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,627 [salt.transport.ipc
][DEBUG ][14403] Initializing new IPCClient for path: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,627 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,628 [salt.utils.event
][DEBUG ][14403] Sending event: tag = /salt/minion/minion_schedule_delete_complete; data = {'_stamp': '2017-09-18T08:00:28.628174', 'complete': True, 'schedule': {'__mine_interval':
{'function': 'mine.update', 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 2, 'minutes': 60}, '__master_alive_saltsyndic06.domain.net': {'function': 'status.master', 'seconds': 300
, 'enabled': True, 'return_job': False, 'jid_include': True, 'maxrunning': 1, 'kwargs': {'connected': True, 'master': 'saltsyndic06.domain.net'}}}}
2017-09-18 10:00:28,628 [salt.transport.ipc
][TRACE ][14403] IPCClient: Connecting to socket: /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,628 [salt.utils.event
][DEBUG ][14403] Stream is closed
2017-09-18 10:00:28,629 [salt.transport.ipc
][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:28,629 [salt.transport.ipc
][TRACE ][14403] IPCServer: Handling connection to address:
2017-09-18 10:00:28,630 [salt.transport.ipc
][TRACE ][14403] Client disconnected from IPC /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,630 [salt.transport.ipc
][TRACE ][14403] Client disconnected from IPC /var/run/salt-minion/minion/minion_event_169df1011b_pull.ipc
2017-09-18 10:00:28,630 [salt.minion
][CRITICAL][14403] Unexpected error while connecting to saltsyndic06.domain.net
Traceback (most recent call last):
File "/opt/salt-minion/lib/python2.6/site-packages/salt/minion.py", line 875, in _connect_minion
yield minion.connect_master(failed=failed)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/concurrent.py", line 214, in result
raise_exc_info(self._exc_info)
File "/opt/salt-minion/lib64/python2.6/site-packages/tornado/gen.py", line 876, in run
@tsaridas I've figured out what is happening on your system and if I'm right this problem is actual in the current codebase.
The problem is that each SaltEvent instance manages IPCClient instance while the IPCClient instance is a singleton (multion actually, but it doesn't makes sense here) and the SaltEvent instance must just use it and forget and never explicitly close it.
What you see in simplified form:
I plan to write a POC test and if I'm not wrong as a fix I propose to just remove the close() call to the singleton instance. Let GC to do it's job by itself.
Also if it's correct I want to look for similar issues in our codebase for other singletons.
@thatch45 @saltstack/team-core @skizunov what do you think about it?
I've marked it as a high severity but it's rarely reproduced bug, so maybe it's severity may be changed to medium.
I've hit this issue 3 times in the last 2 weeks on 2017.7.2
value = future.result()
File "/data/python_venv/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/data/python_venv/lib/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(exc_info)
File "/data/main/.py", line 30, in post
if (yield self.sid_checker(req_info.cmd, self.protocal_info.sid, rsp_info)):
File "/data/python_venv/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/data/python_venv/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/data/python_venv/lib/python2.7/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(exc_info)
File "/data/main/APIMain.py", line 134, in sid_checker
with (yield pool.Connection()) as conn:
File "/data/python_venv/chanzai_venv/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/data/python_venv/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/data/python_venv/lib/python2.7/site-packages/tormysql/util.py", line 14, in finish
result = fun(args, *kwargs)
File "/data/python_venv/lib/python2.7/site-packages/tormysql/connections.py", line 268, in connect
raise exc
OperationalError: (2003, "Can't connect to MySQL server on 10.214.245.82:3306 (OperationalError(2006, 'MySQL server has gone away (StreamClosedError())'))")
我的这个问题和你的很类似,请问你的问题解决了吗?是怎么解决的?能给我个思路吗?
This issue should be fixed by #52445: this is caused by wrong shared use of singletons by our code. Now we've dropped singletons from the client and it operates like a simple object.
It will be released in the next releases 2018.3.5 and 2019.2.1.
@xudexuan I don't think your problem is the same. Look up where you can use the stream after it already closed.