Salt: Error when trying to run orch command if another orch is running by scheduler.

Created on 30 Nov 2016 · 11Comments · Source: saltstack/salt

Description of Issue/Question

I have two orchestration state files. First one is running every 15 minutes by scheduler:

schedule:
  http_frontend_haproxy:
    function: state.orchestrate
    minutes: 15
    args:
      - orch.http_frontend.haproxy

/srv/salt/orch/http_frontend/haproxy.sls:

orch_haproxy_conf_generate:
  salt.state:
    - tgt: 'config_server'
    - sls: 
      - orch.http_frontend.haproxy_conf_generate

orch_haproxy_conf_update:
  salt.state:
    - tgt: 'role:http_frontend'
    - tgt_type: 'grain'
    - sls: 
      - orch.http_frontend.haproxy_conf_update
   ```


And second one I'm running manually with command:

> salt-run state.orch orch.http_frontend.bird

/srv/salt/orch/http_frontend/bird.sls:

refresh_pillar:
salt.function:

name: saltutil.refresh_pillar
tgt: 'role:http_frontend'
tgt_type: 'grain'

update_bird_config:
salt.state:
- tgt: 'role:http_frontend'
- tgt_type: 'grain'
- sls:
- http_frontend.bird_config


If first orchestration is already running by scheduler and I'm trying to run second one with salt-run state.orch  - I'm getting error:

master:

Data failed to compile:

The function "state.orchestrate" is running as PID 16907 and was started at 2016, Nov 30 18:18:56.071485 with jid 20161130181856071485

But everything is fine if I'm running both of them manually with commands:

salt-run state.orch orch.http_frontend.haproxy
salt-run state.orch orch.http_frontend.bird

```

What am I doing wrong?

Versions Report

Salt Version:
Salt: 2016.11.0

Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: Not Installed
gitdb: 0.6.4
gitpython: 1.0.1
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: 1.4.3
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Nov 20 2015, 02:00:19)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: 0.9.0
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.2.1511 Core
machine: x86_64
release: 3.10.0-327.28.2.el7.x86_64
system: Linux
version: CentOS Linux 7.2.1511 Core

Bug Core severity-medium team-core

Source

gummeah

All 11 comments

I believe this is the expected behaviour.

The master can only run one set of orchestration states at the same time.

When you are running them one command after another, the first one finishes running before the second one starts.

If you try to run it with --async, you should get the same error.

salt-run --async state.orch orch.http_frontend.haproxy
salt-run state.orch orch.http_frontend.bird

Thanks,
Daniel

gtmanfred on 30 Nov 2016

No, the first one is running for few minutes, there's a lot of job to do. So i'm able to run the second one in another terminal window simultaneously with the first.

gummeah on 30 Nov 2016

And why reactor documentation suggesting to use orchestration with it(with reactor system) then? To use only one orchestration? Or how to control that if you have multiple reactors with orchestration actions on it and multiple events arise in a short period of time?

gummeah on 30 Nov 2016

😕1

Ok, so this should definitely be the same behaviour between the two.

I think the goal is to make this so that it will be configurable to be run with concurrent from the scheduler. And set it so that it has the same behaviour as right now, but could be configured to run how you want. Then we will update the behavior in Nitrogen.

gtmanfred on 30 Nov 2016

👍1

I was able to replicate this behavior as described with the following schedule.

[root@65a0c37fd433 /]# tail -c +0 /etc/salt/master.d/sched.conf /srv/salt/test.sls
==> /etc/salt/master.d/sched.conf <==
schedule:
  sleep:
    function: state.orch
    minutes: 20
    concurrent: True
    args:
      - test

==> /srv/salt/test.sls <==
deploy:
  salt.function:
    - tgt: '*'
    - name: test.sleep
    - arg:
      - 1000

and here is the output when the schedule is running.

[root@65a0c37fd433 /]# salt-run jobs.active
20161130193228085768:
    ----------
    Arguments:
        - 1000
    Function:
        test.sleep
    Returned:
    Running:
        |_
          ----------
          65a0c37fd433:
              2212
    StartTime:
        2016, Nov 30 19:32:28.085768
    Target:
        *
    Target-type:
        glob
    User:
        root
[root@65a0c37fd433 /]# salt-run state.orch test
65a0c37fd433_master:
    Data failed to compile:
----------
    The function "state.orch" is running as PID 2003 and was started at 2016, Nov 30 19:32:06.606691 with jid 20161130193206606691
retcode:
    1

This should be the same behavior as it is from the commandline, but for some reason it appears that concurrent is set to True somewhere.

gtmanfred on 30 Nov 2016

👍1

It appears that you can pass the queue=True option on the commandline, or in the scheduler or reactor, and if one state run is already running, the orchestrate will wait for it to finish running instead of failing to run at all.

gtmanfred on 6 Dec 2016

Disregard, error on my end

clallen on 7 Sep 2017

Is there any progress on having the ability to run multiple Orchestration tasks? Orchestration is not very useful f it has to be single threaded

twellspring on 2 Mar 2018

@thatch45 Can we have a bit of an escalation for this issue which has been open for a year. This demotes one of the selling points of salt.

damon-atkins on 3 Mar 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.