\cc @isbm You wanted to look at this :smile_cat:
When doing saltutil.sync_modules over salt-ssh, it seems that custom modules (in _modules directory) aren't included in thin. It leads to errors when some state on the target system relies on their presence.
_modules directory in some environmentIssue a salt-ssh call via the api (most likely it's reproducible with the salt-ssh cli as well) that applies the state that uses the custom module. It fails and the thin directory on the target system doesn't include custom modules.
I believe there was a similar problem in the past solved by https://github.com/saltstack/salt/issues/9560. However, it seems that the code responsible the modules kicks in only when there is a state that references something from the salt:// file server (https://github.com/saltstack/salt/blob/develop/salt/client/ssh/wrapper/state.py#L81 produces empty dictionary, which leads to not including the modules in prep_trans_tar in the same file). Not sure if this is intended behavior! Manually hacking https://github.com/saltstack/salt/blob/develop/salt/client/ssh/state.py#L107 to return something nonempty "fixes the issue" :) .
Salt Version:
Salt: 2015.8.7
Dependency Versions:
Jinja2: 2.7.3
M2Crypto: Not Installed
Mako: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.0
Python: 2.7.9 (default, Dec 21 2014, 11:02:59) [GCC]
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.4
cffi: 1.1.0
cherrypy: 3.6.0
dateutil: 2.4.2
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libgit2: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: 2.10
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: SuSE 12 x86_64
machine: x86_64
release: 3.12.62-60.64.8-default
system: SUSE Linux Enterprise Server 12 x86_64
@hustodemon chances that we will look at this 98% if there is an _actual_ reproducer without guessing what exactly we will need to put in those APIs so we can copypaste it and focus fixing the problem. And the rest is the probability that somebody will look at this if there is no clear reproducer. :smile:
@cachedout We could not reliably reproduce this and usually everything works fine. Admittedly, something fishy is happening with the permissions sometimes, although we did not investigate anything further because this is only a topic for a complex environments like ours. I would propose to call this as Heisenbug for now and close this issue until something solid appears so we can reopen it.
Sounds good! Thanks
I am having the same issue in salt-ssh 2016.3.4 I am afraid. I haven't looked into it as much as @hustodemon. Steps to reproduce:
ssh_wipe: False in Saltfilethin_dir: /opt/salt-thin in rosterstates/_modules/application.py:import salt.utils
__virtualname__ = 'application'
def __virtual__():
return __virtualname__
def test():
return True
salt-ssh '*' application.testTrue, we can find the module in the minion as well:minion1: /opt/salt-thin# find . |grep application
./running_data/var/cache/salt/minion/extmods/modules/application.py
./running_data/var/cache/salt/minion/extmods/modules/application.pyc
states/_modules/application.py, add another method and run saltutil.sync_all:def more_test():
return True
salt-ssh '*' application.more_test:minion1:
----------
retcode:
0
stderr:
'application.new_test' is not available.
stdout:
Comments:
application.test returns "not available" as well/opt/salt-thinsudo: True in rostersalt-ssh has been installed via pipssh_wipe: True in Saltfile, the problem goes away, but that slows down salt-ssh a lot.2016-11-25 14:51:29,813 [salt.fileclient ][DEBUG ][30415] In saltenv 'base', looking at rel_path '_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,813 [salt.fileclient ][DEBUG ][30415] In saltenv 'base', ** considering ** path '/tmp/salt/files/base/_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,814 [salt.fileclient ][INFO ][30415] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://_modules/application.py'
2016-11-25 14:51:29,815 [salt.utils.lazy ][DEBUG ][30415] LazyLoaded local_cache.prep_jid
2016-11-25 14:51:29,817 [salt.loaded.int.returner.local_cache][DEBUG ][30415] Adding minions for job 20161125145129815869: ['minion1', 'minion2', 'minion3']
2016-11-25 14:51:29,848 [salt.utils.lazy ][DEBUG ][30425] Could not LazyLoad saltutil.sync_all
2016-11-25 14:51:29,849 [salt.client.ssh ][DEBUG ][30425] Performing shimmed, blocking command as follows:
saltutil.sync_all
@gtmanfred @isbm Can we reopen this issue if is indeed reproducible?
excluding a host removes it from the targeted list, but then says "nodes were requested to be excluded but were not found in the node list"
[vagrant@es-dev-01 ~]$ sudo ./es_rolling_upgrade.py -e es-dev-01 -s -u elastic -d dev -a upgrade_es es-dev-06
Password:
2017-11-29 21:35:49,651 INFO Requesting node list from cluster
2017-11-29 21:35:49,719 INFO Obtained list of 7 nodes
2017-11-29 21:35:49,719 INFO === master ===
2017-11-29 21:35:49,719 INFO es-dev-01.example.local
2017-11-29 21:35:49,720 INFO es-dev-07.example.local
2017-11-29 21:35:49,720 INFO es-dev-03.example.local
2017-11-29 21:35:49,720 INFO es-dev-02.example.local
2017-11-29 21:35:49,720 INFO === client ===
2017-11-29 21:35:49,720 INFO === data ===
2017-11-29 21:35:49,720 INFO es-dev-04.example.local
2017-11-29 21:35:49,720 INFO es-dev-05.example.local
2017-11-29 21:35:49,720 INFO es-dev-06.example.local
2017-11-29 21:35:49,720 INFO === other ===
2017-11-29 21:35:49,720 ERROR The following nodes were requested to be excluded but were not found in the node list:
2017-11-29 21:35:49,720 ERROR - es-dev-01
2017-11-29 21:35:49,720 ERROR Halting execution. Please update the list of excluded hosts.
[vagrant@es-dev-01 ~]$ sudo ./es_rolling_upgrade.py -e es-dev-01.example.local -s -u elastic -d dev -a upgrade_es es-dev-06
Password:
2017-11-29 21:36:08,266 INFO Requesting node list from cluster
2017-11-29 21:36:08,310 INFO Obtained list of 6 nodes
2017-11-29 21:36:08,310 INFO === master ===
2017-11-29 21:36:08,310 INFO es-dev-07.example.local
2017-11-29 21:36:08,310 INFO es-dev-03.example.local
2017-11-29 21:36:08,310 INFO es-dev-02.example.local
2017-11-29 21:36:08,310 INFO === client ===
2017-11-29 21:36:08,310 INFO === data ===
2017-11-29 21:36:08,311 INFO es-dev-04.example.local
2017-11-29 21:36:08,311 INFO es-dev-05.example.local
2017-11-29 21:36:08,311 INFO es-dev-06.example.local
2017-11-29 21:36:08,311 INFO === other ===
2017-11-29 21:36:08,311 ERROR The following nodes were requested to be excluded but were not found in the node list:
2017-11-29 21:36:08,311 ERROR - es-dev-01.example.local
2017-11-29 21:36:08,311 ERROR Halting execution. Please update the list of excluded hosts.
Most helpful comment
I am having the same issue in salt-ssh 2016.3.4 I am afraid. I haven't looked into it as much as @hustodemon. Steps to reproduce:
ssh_wipe: FalseinSaltfilethin_dir: /opt/salt-thininrosterstates/_modules/application.py:salt-ssh '*' application.testTrue, we can find the module in the minion as well:states/_modules/application.py, add another method and runsaltutil.sync_all:salt-ssh '*' application.more_test:Comments:
application.testreturns "not available" as well/opt/salt-thinsudo: Truein rostersalt-sshhas been installed via pipssh_wipe: TrueinSaltfile, the problem goes away, but that slows down salt-ssh a lot.@gtmanfred @isbm Can we reopen this issue if is indeed reproducible?