Salt: [salt-ssh] Custom modules don't seem to be copied to target system

Created on 9 Nov 2016 · 5Comments · Source: saltstack/salt

Description of Issue/Question

\cc @isbm You wanted to look at this :smile_cat:

When doing saltutil.sync_modules over salt-ssh, it seems that custom modules (in _modules directory) aren't included in thin. It leads to errors when some state on the target system relies on their presence.

Setup

system with salt-master:
- custom module inside _modules directory in some environment
- some state that makes use of such module
system with ssh daemon

Steps to Reproduce Issue

Issue a salt-ssh call via the api (most likely it's reproducible with the salt-ssh cli as well) that applies the state that uses the custom module. It fails and the thin directory on the target system doesn't include custom modules.

Further notes

I believe there was a similar problem in the past solved by https://github.com/saltstack/salt/issues/9560. However, it seems that the code responsible the modules kicks in only when there is a state that references something from the salt:// file server (https://github.com/saltstack/salt/blob/develop/salt/client/ssh/wrapper/state.py#L81 produces empty dictionary, which leads to not including the modules in prep_trans_tar in the same file). Not sure if this is intended behavior! Manually hacking https://github.com/saltstack/salt/blob/develop/salt/client/ssh/state.py#L107 to return something nonempty "fixes the issue" :) .

Versions Report

Salt Version:
             Salt: 2015.8.7

Dependency Versions:
         Jinja2: 2.7.3
       M2Crypto: Not Installed
           Mako: Not Installed
         PyYAML: 3.10
          PyZMQ: 14.0.0
         Python: 2.7.9 (default, Dec 21 2014, 11:02:59) [GCC]
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4
           cffi: 1.1.0
       cherrypy: 3.6.0
       dateutil: 2.4.2
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
        libgit2: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: Not Installed
          smmap: Not Installed
        timelib: Not Installed

System Versions:
           dist: SuSE 12 x86_64
        machine: x86_64
        release: 3.12.62-60.64.8-default
         system: SUSE Linux Enterprise Server  12 x86_64

cannot-reproduce

Source

hustodemon

👍2

Most helpful comment

I am having the same issue in salt-ssh 2016.3.4 I am afraid. I haven't looked into it as much as @hustodemon. Steps to reproduce:

Set ssh_wipe: False in Saltfile
Set custom thin_dir: /opt/salt-thin in roster
Put the following code under states/_modules/application.py:

import salt.utils

__virtualname__ = 'application'

def __virtual__():
    return __virtualname__

def test():
    return True

Run salt-ssh '*' application.test
If this is the first time the thin client is deployed, the command will be successful and return True, we can find the module in the minion as well:

minion1: /opt/salt-thin# find . |grep application
./running_data/var/cache/salt/minion/extmods/modules/application.py
./running_data/var/cache/salt/minion/extmods/modules/application.pyc

Edit states/_modules/application.py, add another method and run saltutil.sync_all:

def more_test():
    return True

Run salt-ssh '*' application.more_test:

minion1:
    ----------
    retcode:
        0
    stderr:
        'application.new_test' is not available.
    stdout:

Comments:

Running application.test returns "not available" as well
The module has disappeared from /opt/salt-thin
I am using sudo: True in roster
salt-ssh has been installed via pip
When setting ssh_wipe: True in Saltfile, the problem goes away, but that slows down salt-ssh a lot.
master.log:

2016-11-25 14:51:29,813 [salt.fileclient  ][DEBUG   ][30415] In saltenv 'base', looking at rel_path '_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,813 [salt.fileclient  ][DEBUG   ][30415] In saltenv 'base', ** considering ** path '/tmp/salt/files/base/_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,814 [salt.fileclient  ][INFO    ][30415] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://_modules/application.py'
2016-11-25 14:51:29,815 [salt.utils.lazy  ][DEBUG   ][30415] LazyLoaded local_cache.prep_jid
2016-11-25 14:51:29,817 [salt.loaded.int.returner.local_cache][DEBUG   ][30415] Adding minions for job 20161125145129815869: ['minion1', 'minion2', 'minion3']
2016-11-25 14:51:29,848 [salt.utils.lazy  ][DEBUG   ][30425] Could not LazyLoad saltutil.sync_all
2016-11-25 14:51:29,849 [salt.client.ssh  ][DEBUG   ][30425] Performing shimmed, blocking command as follows:
saltutil.sync_all

@gtmanfred @isbm Can we reopen this issue if is indeed reproducible?

manjiki on 25 Nov 2016

👍3

All 5 comments

@hustodemon chances that we will look at this 98% if there is an _actual_ reproducer without guessing what exactly we will need to put in those APIs so we can copypaste it and focus fixing the problem. And the rest is the probability that somebody will look at this if there is no clear reproducer. :smile:

isbm on 9 Nov 2016

@cachedout We could not reliably reproduce this and usually everything works fine. Admittedly, something fishy is happening with the permissions sometimes, although we did not investigate anything further because this is only a topic for a complex environments like ours. I would propose to call this as Heisenbug for now and close this issue until something solid appears so we can reopen it.

isbm on 10 Nov 2016

Sounds good! Thanks

gtmanfred on 10 Nov 2016

I am having the same issue in salt-ssh 2016.3.4 I am afraid. I haven't looked into it as much as @hustodemon. Steps to reproduce:

Set ssh_wipe: False in Saltfile
Set custom thin_dir: /opt/salt-thin in roster
Put the following code under states/_modules/application.py:

import salt.utils

__virtualname__ = 'application'

def __virtual__():
    return __virtualname__

def test():
    return True

Run salt-ssh '*' application.test
If this is the first time the thin client is deployed, the command will be successful and return True, we can find the module in the minion as well:

minion1: /opt/salt-thin# find . |grep application
./running_data/var/cache/salt/minion/extmods/modules/application.py
./running_data/var/cache/salt/minion/extmods/modules/application.pyc

Edit states/_modules/application.py, add another method and run saltutil.sync_all:

def more_test():
    return True

Run salt-ssh '*' application.more_test:

minion1:
    ----------
    retcode:
        0
    stderr:
        'application.new_test' is not available.
    stdout:

Comments:

Running application.test returns "not available" as well
The module has disappeared from /opt/salt-thin
I am using sudo: True in roster
salt-ssh has been installed via pip
When setting ssh_wipe: True in Saltfile, the problem goes away, but that slows down salt-ssh a lot.
master.log:

2016-11-25 14:51:29,813 [salt.fileclient  ][DEBUG   ][30415] In saltenv 'base', looking at rel_path '_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,813 [salt.fileclient  ][DEBUG   ][30415] In saltenv 'base', ** considering ** path '/tmp/salt/files/base/_modules/application.py' to resolve 'salt://_modules/application.py'
2016-11-25 14:51:29,814 [salt.fileclient  ][INFO    ][30415] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://_modules/application.py'
2016-11-25 14:51:29,815 [salt.utils.lazy  ][DEBUG   ][30415] LazyLoaded local_cache.prep_jid
2016-11-25 14:51:29,817 [salt.loaded.int.returner.local_cache][DEBUG   ][30415] Adding minions for job 20161125145129815869: ['minion1', 'minion2', 'minion3']
2016-11-25 14:51:29,848 [salt.utils.lazy  ][DEBUG   ][30425] Could not LazyLoad saltutil.sync_all
2016-11-25 14:51:29,849 [salt.client.ssh  ][DEBUG   ][30425] Performing shimmed, blocking command as follows:
saltutil.sync_all

@gtmanfred @isbm Can we reopen this issue if is indeed reproducible?

manjiki on 25 Nov 2016

👍3

excluding a host removes it from the targeted list, but then says "nodes were requested to be excluded but were not found in the node list"

[vagrant@es-dev-01 ~]$ sudo ./es_rolling_upgrade.py -e es-dev-01 -s -u elastic -d dev -a upgrade_es es-dev-06
Password: 
2017-11-29 21:35:49,651 INFO     Requesting node list from cluster
2017-11-29 21:35:49,719 INFO     Obtained list of 7 nodes
2017-11-29 21:35:49,719 INFO     === master ===
2017-11-29 21:35:49,719 INFO     es-dev-01.example.local
2017-11-29 21:35:49,720 INFO     es-dev-07.example.local
2017-11-29 21:35:49,720 INFO     es-dev-03.example.local
2017-11-29 21:35:49,720 INFO     es-dev-02.example.local
2017-11-29 21:35:49,720 INFO     === client ===
2017-11-29 21:35:49,720 INFO     === data ===
2017-11-29 21:35:49,720 INFO     es-dev-04.example.local
2017-11-29 21:35:49,720 INFO     es-dev-05.example.local
2017-11-29 21:35:49,720 INFO     es-dev-06.example.local
2017-11-29 21:35:49,720 INFO     === other ===
2017-11-29 21:35:49,720 ERROR    The following nodes were requested to be excluded but were not found in the node list:
2017-11-29 21:35:49,720 ERROR    - es-dev-01
2017-11-29 21:35:49,720 ERROR    Halting execution. Please update the list of excluded hosts.
[vagrant@es-dev-01 ~]$ sudo ./es_rolling_upgrade.py -e es-dev-01.example.local -s -u elastic -d dev -a upgrade_es es-dev-06
Password: 
2017-11-29 21:36:08,266 INFO     Requesting node list from cluster
2017-11-29 21:36:08,310 INFO     Obtained list of 6 nodes
2017-11-29 21:36:08,310 INFO     === master ===
2017-11-29 21:36:08,310 INFO     es-dev-07.example.local
2017-11-29 21:36:08,310 INFO     es-dev-03.example.local
2017-11-29 21:36:08,310 INFO     es-dev-02.example.local
2017-11-29 21:36:08,310 INFO     === client ===
2017-11-29 21:36:08,310 INFO     === data ===
2017-11-29 21:36:08,311 INFO     es-dev-04.example.local
2017-11-29 21:36:08,311 INFO     es-dev-05.example.local
2017-11-29 21:36:08,311 INFO     es-dev-06.example.local
2017-11-29 21:36:08,311 INFO     === other ===
2017-11-29 21:36:08,311 ERROR    The following nodes were requested to be excluded but were not found in the node list:
2017-11-29 21:36:08,311 ERROR    - es-dev-01.example.local
2017-11-29 21:36:08,311 ERROR    Halting execution. Please update the list of excluded hosts.