Awx: Feature Request: Add a pause with approval to continue between playbooks in workflow job

Created on 12 Feb 2018  ·  49Comments  ·  Source: ansible/awx

ISSUE TYPE
  • Feature Idea
COMPONENT NAME
  • UI
SUMMARY

I am requesting a pause feature be included as one of the steps in a job workflow in-between playbooks so that a user can give the +1 to continue on to the next playbook in the job workflow.

This will give the user the unlimited amount of time they need to do their manual steps, but also allow them to continue as fast as they are ready with out having to wait on some other trigger.

api ui high enhancement

Most helpful comment

This feature has been merged. PR: https://github.com/ansible/awx/pull/4264

All 49 comments

This is not something we would likely get to in the near future, but it has been requested a few times, including:

  • setting a specific approver != the job submitter
  • setting multiple approvers

Note that the best workaround for this case is using wait_for or get_url tasks in Ansible waiting for a flag/port to be available that is set by your approval process.

This is much needed feature.

The workaround which you are suggesting wait_for, cannot be used in our case. Operations users in our company only have UI access and they are not allowed to login into CLI and set some flag in any way.

wait_for and get_url don't require local CLI accesss, they can check ports on remote machines (wait_for) or pretty much any URL state such as a ticket state (get_url).

I second @ghost, this feature would be really useful.
I do believe that wait_for would be imperfect as it will use AWX resources (for a running job template) for polling so that it would not scale well if used for all workflows to run at the same time.

I was thinking that maybe, rather than having this "Approval feature" as a Workflow feature only, having it placed at the Job Template level would be a better idea because it could be used for each node of a Workflow and for single Job Template too.
Until one of the approvers could actually approve the run of the job template, it would then stay in a "Pending Approval" state.

Added to the already available job scheduling features, it would really increase the usefulness of the project, according to me.

This is most likely going to involve:

  • Adding a job state to indicate to the task manager to not process the job
  • Adding a job state to indicate approved/rejected
  • Updating the task manager to handle not processing the job until it's been approved
  • RBAC for delegating approval
  • Notification support for alerting approvers
  • UI/UX work for approval/rejecting

    • Workflow node to hook approval/rejecting

----- Future ----- ?

  • External approval/rejecting

    • workflow role execution that passes/fails

    • http poll?

@matburt unless the approval step is an actual task with certain parameters

Thoughts on UI impact:

Workflow Editor
We'll need some way of defining or denoting a wait node. I think I would prefer this be it's own node as opposed to being a flag attached to an existing node (with a JT, etc, etc) but am open to discussion there. I think it would make the graph easier to read/understand at a glance. If we do go down the path of making wait it's own node then we'll need to update the add/edit node form to give the user the ability to select a template/project/inv _or_ define the node as wait. Selecting wait may expose some more fields in that form:

1) Who the approver(s) is/are. Should this be optional? No approvers means anyone with execute access on the wfjt can resume execution?
2) Notifications - This is a bit unclear to me. Do we want to give users the ability to select a notification to send _inside the workflow editor_ or are we envisioning them interacting with the Notifications tab at the wfjt level to do this?

Question: Can you have more than one wait node in a workflow? If not then we'll need to handle that restriction in the UI.

Jobs List

  1. Show new paused state on workflow jobs where necessary
  2. Add play/resume button to paused workflow jobs when user has the ability to continue the job run.

Question: Should the act of resuming/approving a job take the user to the workflow results page or leave them on the current page?

My View Jobs List

  1. Show new paused state on workflow jobs where necessary
  2. Add play/resume button to paused workflow jobs when user has the ability to continue the job run.

Workflow Results

  1. Show paused state
  2. We probably want some visual indication on the graph itself that the workflow is paused. Maybe we draw attention to the wait node somehow?
  3. If he user viewing the results page can resume/continue the run then we should probably expose a button to do that (maybe up by where the relaunch button is shown). Same button that we would use in the list view.

Question: Do we ever see a situation/need arising whereby a user could establish a wait node but rather than requiring approval it requires additional user input (like prompt on launch)? If so, I'm not sure that it would impact our architecture decisions but it may be something to think about.

I could see an argument being made for providing our users with a more streamlined view of all jobs awaiting your approval rather than making them go digging through the jobs list to find the job. Maybe we could show a toast or banner across the top of the dashboard letting them know that a job is awaiting their approval(?)

This is tricky, because in order for this to work on a standard job template the approval/disapproval would have to be associated with the job itself. If we carry that forward into Workflows it would cause the user to have to create a stub job to act as the approval step, but for notifications it would give us something to hang a notification onto.

We could go a different way with this entirely, which I think is what the original proposal is asking for anyway, and require that approvals use a workflow even if they are just looking to run a single job. This is what I'm leaning towards now. The approval node then becomes another node type of WorkflowJobTemplateNode that links to something like a WorkflowLogicNode. This node, at least at first, could encapsulate just this Approval feature (also a Pause-for-time feature?) but in the future could be used to introduce more complex logic to a workflow that doesn't strictly adhere to a traiditional UnifiedJobTemplate.

Given this, you could have as many of these as you want, anywhere in your Workflow. A Workflow overall could also include a timeout value where workflows don't hangout forever... that they eventually expire if no one comes along to approve it.

Given that we'd do this entirely from workflows, it could appear as something distinctive within the Workflow detail view indicating which node needs approval. My intuition is that this will also need some sort of indicator on the Job list view (where we indicate which Workflows are waiting on approval). Probably also from the Dashboard?

Question: Do we ever see a situation/need arising whereby a user could establish a wait node but rather than requiring approval it requires additional user input (like prompt on launch)? If so, I'm not sure that it would impact our architecture decisions but it may be something to think about.

I alluded to this up above, but I could see doing something like this. I think my proposed architecture here would support this in the future without us having to implement anything specific right now.

RBAC
For RBAC we could add another Organization Role for Org-level approvals but also add a role to Workflows to delegate approvals on individual workflows themselves.

Yeah, I'm fine with this being just for a workflow node, and if you want it for a single JT, create a workflow with it.

For RBAC, etc. a MVP would be two choices - '"approve by anyone who can execute", and "approve by org admin or above". We could consider other choices, but I don't think we need them initially.

Yep. I think Org Admin or higher would work here.

NotificationTemplates can/should be bound on the WorkflowJobTemplate abstraction.

  • Time outs on these elements as failures

From an API perspective there will be some way to get a list of nodes that are awaiting approval
The approval will be sent to the node in question
The approval needs to create an activity stream entry including the user who approved + node
The workflow is in “running” even if all it has is a node awaiting approval

Proposed UI changes (this likely isn't the final iteration):

Creating a pause node in a workflow job template

  • When creating a new node in a workflow a new option Pause will be added. We currently expose 3 different lists in the node form and we use tabs to all the user to toggle between them. We'll be changing those tabs to a dropdown (default is to show the Templates list) to allow for more options.
  • Selecting the Pause node type will present the user with a few additional options:
  1. A timeout input which will let the user override the default timeout for the pause. This is the amount of time that an approver will have to respond before the workflow continues as if "deny" was selected by the approver.
  2. A series of rbac related options will be presented to the user. I'm envisioning this being a series of checkboxes or radio buttons that let the creator of the workflow define _who_ can approve/pause the continuation of the workflow. The exact roles are still tbd at this point but some ideas might be: org admin or above or anyone who can run - something like that
  3. (Potential) We may expose a multi-select lookup field for specifying notification templates that should be executed on the pause node event (think getting a Slack notification when a workflow has been launched and has hit the point where you need to approve). The UX here would mirror something like the Instance Groups lookup that is already exposed in many forms.
  • There's no prompting associated with this node type
  • You can have more than one pause node in a WFJT. They can even be back to back - no restrictions expected there.
  • The display of the pause node in the workflow should be self-obvious. We'll display some text like: Pause - Await Approval or similar. This applies to both the workflow editor and the results.

Approving/denying continued execution

  • A new status/icon will be added to the application header which will indicate to the user that they have pending jobs awaiting approval. We'll need to fetch this data from the api on certain actions. For sure we'll fetch it after login and when the user clicks on icon itself. We probably want to fetch it on any manual refresh of the page as well. We do the same thing with the config endpoint. There are no plans for websocket support here so there's no expectation that this status/icon will update in real time.
  • When the user clicks on this icon, a drawer will slide out from the right-hand side of the page exposing the list of jobs pending approval. In the list, the workflow job template name and job id will be displayed (same as jobs list) as well as a timestamp for when the job was paused and a timestamp for when the pause expires. Finally, two options will be presented to the user (Approve/Deny). Clicking one of these options will make the corresponding request to the api. After an option is chosen, the row will temporarily remain in the list (displaying which option was chosen). The row will no longer appear in the list if the page is refreshed or if the list of jobs awaiting approval is hidden. Clicking on the job name in the drawer should navigate the user to the job in question.

Jobs List

  • If the API creates a new "state" for a WFJT (paused) then the UI will need to expose that anywhere that we display jobs
  • It's possible we may want to expose approve/deny at the job list level. This is TBD.

Workflow Results

  • If provided by the API, we may want to update the workflow results to show if a pause node was approved or denied and who did the approving/denying(?). This is TBD.
  • It's possible we may want to expose approve/deny at the workflow results level. This is TBD.

Activity Stream

  • (Probably) Events related to approving/denying further execution of a workflow should be logged in the activity stream. If this is the case, we should expose those events in the UI.

Discussed some API implementation with @chrismeyersfsu and @beeankha.

Here are the high level notes, @mabashian @kdelee this may interest you.

  1. Add two new models, a WorkflowApprovalTemplate and a WorkflowApproval. These are two new UnifiedJobTemplate and UnifiedJob base classes, respectively.

  2. A WorkflowJobTemplate today has a link to a UJT. Under this new feature, these can now be WorkflowApprovalTemplate objects (instead of just a JobTemplate, ProjectUpdate, etc...).

  3. This new model shouldn't require any actual changes to the task manager. Instead, when the Workflow DAG code encounters a node that represents a WorkflowApproval, that job will just hang out in "pending" forever (or at least, until somebody approves or rejects it). In the eye of the task manager, this is no different than any other job (it's just that under the hood, we're not actually running any Python code that will transition into a successful or failure status).

This means we'll be adding some new APIs. Specifically:

/api/v2/workflow_approval_templates/
/api/v2/workflow_approval_templates/N

These endpoints will allow you to create and edit new approval templates. As we progress in feature development, we'll hang specific features/attributes off of these endpoints, such as "the notification templates for this approval template" or "_who_ can approve".

/api/v2/workflow_approvals/
/api/v2/workflow_approvals/N/
POST /api/v2/workflow_approvals/N/approve/
POST /api/v2/workflow_approvals/N/reject/

@mabashian I expect these endpoints will be useful for "fetching a list of pending approvals" i.e.,

/api/v2/workflow_approvals/?status=pending

Additionally, the UI can use special approve and reject endpoints to actually transition these jobs into the proper state so that workflow execution can continue.

We'll need to work out and answer some questions related to RBAC. Specifically:

  1. Who can create these new approval nodes? Do they have some sort of organizational membership, or are they just sort of created ad-hoc?
  2. When you delete/remove one of these nodes from a workflow, should we go ahead and clean up the underlying WorkflowApprovalTemplate object? It probably doesn't really make sense for them to exist outside of the context of their relationship to an actual node in a workflow. @bianca @chrismeyersfsu how do you feel about making it so that you must specify the node they're attached to at creation time?
  3. How do we make sure we restrict the endpoint functionality? For example, even though this is a common pattern for other UJTs, something like this doesn't really makes sense:

/api/v2/workflow_approval_templates/N/launch/

A few things we'll want to consider functionality and verification-wise:

  1. We need to make sure that system restarts don't cause these approval nodes to be "reaped". Today, the dispatcher has a process where it discovers jobs that were "running" prior to restart, and reaps them. WorkflowApproval should _not_ follow this pattern. Given that these jobs never really enter "running" (they stay in "pending"), I don't think this should be an issue.

  2. We need to make sure we implement activity stream records for _at least_ modifying these WorkflowApproval objects so we have an audit trail when people transition their state from pending to successful or failed via the approve/reject endpoints.

  3. On the "workflow list view", how do we (should we?) call out which workflows are "paused" waiting for approval? @mabashian would /api/v2/workflow_approvals/?status=pending with a summary field that included the WFJT ID be good enough to correlate the data and draw a pending state?

Who can create these new approval nodes? Do they have some sort of organizational membership, or are they just sort of created ad-hoc?

This will definitely be relevant to the UI/UX of this feature. I was under the impression that if a user was able to create/edit a particular workflow they'd be able to create a pause node. If we decide to break from that then we'll need to make sure we handle that in the UI. Would that mean a new key in user_capabilities?

On the "workflow list view", how do we (should we?) call out which workflows are "paused" waiting for approval?

@ryanpetrello Are you talking about this list?

@mabashian

Who can create these new approval nodes? Do they have some sort of organizational membership, or are they just sort of created ad-hoc?

(cc @beeankha @chrismeyersfsu)

I wonder if we should wire up /api/v2/workflow_approval_templates/ such that you can only create them if you also specify a WorkflowJobTemplateNode reference. In other words, when you create it, you're _also_ associating it with a WFJT node at the same time. It doesn't really _make sense_ to create these templates outside of this context, because you're not going to actually reuse them (i.e., you wouldn't make a bunch of preconfigured ones ahead of time and share their usage across different workflows the way you do with JTs).

yes @ryanpetrello exactly that. Workflow Job Template Node(s) share this same behavior.

When a WFJT is deleted, we could also catch that deletion signal and automatically delete the associated WorkflowApprovalTemplate (if there is one).

So @mabashian the idea would be that to create/modify one of these nodes, the UI would:

  1. Create the WorkflowJobTemplateNode.
  2. POST /api/v2/workflow_approval_templates/ {"workflow_job_template_node": N}

...and if you DELETE the node, we'll clean up the underlying UJT on the backend if it's an approval config JT.

And the UI wouldn't really _list_ all of these anywhere in the traditional sense of what we do with e.g,. Job Templates, Projects.

In terms of RBAC, I suppose we could restrict creating and listing these to people who only have access to edit/create workflows? It doesn't seem like there would be any reason to know about them or list/view detail unless you were actually editing or constructing a workflow.

Seems to me like the RBAC requirements for these are very similar to WorkflowJob today.

POST /api/v2/workflow_approval_templates/ {"workflow_job_template_node": N}

RBAC around this is going to be tricky, assuming any version of this list view is introduced. Since permissions are based on the WFJT, you would have to look through the node, then the WFJT, then to the WFJT role.

For instance, this is the kind of filtering it takes to show the job templates where are part of WFJT id=4:

/api/v2/job_templates/?workflowjobtemplatenodes__workflow_job_template=4

To list approval templates, you'd need to filter based on the requesting user having read role for any workflow job template associated in this way.

There's tension in terms of how tightly coupled the approval JT is to the WFJT node. Having a global list view follows a more loosely-coupled model, but requiring the node for creation, and auto-deleting with the node follows a very tightly-coupled model.

Create the WorkflowJobTemplateNode.

Don't we currently require a non-null UJT in order to create a node? By auto-deleting the template with the node, you save yourself from one broken state, but you would still expose yourself to a broken state of being half-way through the UI form with a WFJT node that has no associated template, which must be allowed by this API contract.

When a WFJT is deleted, we could also catch that deletion signal and automatically delete the associated WorkflowApprovalTemplate (if there is one).

A solution is also needed for the edit scenario. Ordinarily, someone can change a WFJT node's unified_job_template. If it points to an approval template, then easiest solution is to return 400 error if someone attempts to change that to a different UJT.

Otherwise the signal processing is going to get real complicated. You could wind up with multiple nodes pointing to the same approval template, and the complexity of corner cases would explode (i.e. delete, but only if completely de-referenced).

What is the ETA for this feature getting developed and delivered?

Hi @spsingh1982 , we don’t have an exact ETA for when this will be released as part of AWX, but it is something we're currently working on. You can take a look at the work-in-progress PR here: https://github.com/ansible/awx/pull/4264

@beeankha @mabashian @chrismeyersfsu @AlanCoding,

After much deliberation, I think this is what we're currently settled on (if I can summarize).

There will not be a global /api/v2/workflow_approval_templates/ endpoint.

To create a _new_ pause node, you do:

POST /api/v2/workflow_job_template_nodes/
{"workflow_job_template": 42, "unified_job_template": null}
POST /api/v2/workflow_job_templates_nodes/X/approval_job_template/
{"timeout": 3600, "name": "Get Manager Approval"}

This means that the only way you can _create_ an approval template is via this endpoint, where it's assigned to a node.

We will continue to have /api/v2/workflow_approvals/ and the approve and deny endpoints we discussed (which @beeankha, you currently have in your PR).

@dsesami and I brought up that it looks like we need a explanation field that can be POSTed with the approval/denial.

Additionally this would be used when a approval node fails due to time out as the text field where tower puts in "Failed due to timeout after {num} seconds"

I also had the thought that there should be another way to know that a workflow is in a pending state; in the running jobs list, it would be nice to highlight the pending workflow in some manner to signal that action needs to be taken. cc @mabashian

@kdelee and I are also wondering how users will review previously approved/denied jobs from the past + view the explanation for why approved/denied.

Seems like clicking on the approval node from the workflow should just take us to a job detail view, just no stdout. Basically it'd be this page, but no stdout block (if hiding the element is difficult for some reason, then just leave it blank maybe?) and then just dropping all the extra vars and just having a few key ones here.

The fields might be something like

Status: Approved
Approver/Denier/Whatever noun: Mike Abashian
Explanation: lorem ipsum...
Decision made at: 2:00:39
Workflow Job: [link]

Screenshot from 2019-07-18 10-38-48

A pile of questions:

  • [ ] What would this look like for those who don’t have the power to approve?
  • [x] Is this role assigned per-workflow or is there also an overall approver for all workflows? I think it's a bit of both?

    • [x] What kind of privileges does a workflow admin have to approve/deny?

    • [x] Who can grant approval permissions?

    • [ ] What happens with a nested workflow? Where the approval node is one or more levels down?

  • [ ] What happens to the parent workflow, will information bubble up that the nested workflow is pending approval? What if nested workflow fails because of timeout or denial?

@kdelee and I are also wondering how users will review previously approved/denied jobs from the past + view the explanation for why approved/denied.

I think our intention is to use the activity stream to record approve/deny, since it already has a lot of the attributes we care about.

I also had the thought that there should be another way to know that a workflow is in a pending state; in the running jobs list, it would be nice to highlight the pending workflow in some manner to signal that action needs to be taken. cc @mabashian

In a recent chat about this, I think we decided to not show approval jobs in the global job list, because the implications of calculating their RBAC (_can_ this be approved by the current user) in the context of the global unified job list worried us from a query performance perspective cc @AlanCoding @matburt.

@beeankha or @AlanCoding,

https://github.com/ansible/awx/issues/1206#issuecomment-512854848

Could one of you summarize some answers here for @kdelee re: the new approval_role implicit fields we're currently working on?

Is this role assigned per-workflow or is there also an overall approver for all workflows? I think it's a bit of both?

I think we had a very clear answer to this, and the implementation has already begun.

Organizations get a new approval_role role, and Workflow JTs get a new approval_role. Those with either of these roles can approve any pending approval nodes in the workflow.

(this assumes the existing parentage structure is obvious to you, workflows live inside an organization, and the nodes live inside a workflow)

I think we decided to not show approval jobs in the global job list, because the implications of calculating their RBAC (can this be approved by the current user) in the context of the global unified job list worried us from a query performance perspective

@ryanpetrello I would argue that this scenario is the exact thing that it's needed for. It should say something like (assuming they can see the WF in the first place) "This workflow is in a pending state until someone with permissions approves it". Basically, IT or whomever can then ask the approver to push the job forward. At least, that's what I'm imagining, feel free to correct me if I'm missing something though

There are two proposals I worry are getting confused here:

  • Showing "Approval Jobs" in the UI JOBS list
  • Showing a badge / counter / infographic in the workflow job's entry about pending approvals

We decided against showing approval jobs in the list. They have a confusing role (duplicated with the workflow job), and have their own separate place in the UI.

This workflow is in a pending state until someone with permissions approves it

It's more nuanced than that. The workflow job is in the "running" state if an approval job inside of it is in the "pending" state. Also, the blocking by an approval job is specific to 1 branch, so other branches may have running jobs simultaneously.

It _is_ valuable information to the user to see that a workflow job is waiting on 1 or more approvals. I just don't know if showing this in the UI JOBS list is in scope for the initial version of the feature.

my main concern is that a user might just see the "running" state on the workflow itself, not see a notification that an approval is pending, and just leave it without knowing something is needed. as long as the pending approval list is marked clearly enough, I'm ok with that.

@dsesami one of the things we're adding with this feature is a new item in the top nav that displays pending approvals:

image
image

In the first pass of this feature, we're not planning on showing these approvals as distinct jobs in the global jobs list.

That works for me.

Sorry, I've briefly checked current development progress.

Will there be a descriptive message/or jinja template based message that will show current state of workflow, so that reviewer will approve specific actions that will be performed? I.e. configuration diff, or dynamic values calculated to that stage.

@kdelee and I are also wondering how users will review previously approved/denied jobs from the past + view the explanation for why approved/denied.

Seems like clicking on the approval node from the workflow should just take us to a job detail view, just no stdout. Basically it'd be this page, but no stdout block (if hiding the element is difficult for some reason, then just leave it blank maybe?) and then just dropping all the extra vars and just having a few key ones here.

The fields might be something like

Status: Approved
Approver/Denier/Whatever noun: Mike Abashian
Explanation: lorem ipsum...
Decision made at: 2:00:39
Workflow Job: [link]

@ryanpetrello so answer to "Will the approval node in the workflow job view link to anything with a details link like the other nodes is "NO" ?

@kdelee we haven't planned any UX interaction where you go view an approval job in the UI on a separate page the way you do job results. Our plan is to have a global notification list where you can view the list of pending approvals and (if you have permissions) approve or deny them.

We'll likely make it so that when you're viewing a workflow that has a pause node, you can also approve/deny it there (if you have permission):

image

Also, the name field already comes from user entry at the time of creating the node / approval JT. I don't see any need for an explanation field beyond what has already been entered into that field. Related job names are _already_ shown in the UI on all nodes in a workflow job.

If the node times out, job_explanation field is planned to be populated with a message that it times out. I don't know how this particular field will be exposed in the UI as of yet. That's a pertinent question.

Will there be a descriptive message/or jinja template based message that will show current state of workflow, so that reviewer will approve specific actions that will be performed? I.e. configuration diff, or dynamic values calculated to that stage.

Not in this version. There's no real mechanism to pass arbitrary data in this way to show the user other than maybe showing set_stats content, and I could see that running afoul of other uses of set_stats.

This feature has been merged. PR: https://github.com/ansible/awx/pull/4264

Was this page helpful?
0 / 5 - 0 ratings