We are triggering many concurrent jobs via a job template in AWX, using the API. The job template includes a webhook notification to post the results back to the application that triggers the jobs.
However, some jobs' webhook notifications don't fire, despite the job completing (successfully or otherwise). There is nothing obvious in the awx-task logs indicating a failure to send notifications for the jobs that do not send.
All jobs send notifications of their result, whether success or failure.
Not all jobs send notifications of their result.
In the cases where no notifications are sent, the /jobs/
I was able to do a little more investigation here. I added some logging in awx/main/dispatch/worker/callback.py in a couple of points where it looked like notifications would silently not trigger, and found that the this is indeed the case in these scenarios:
In addition to this, I found that the remaining jobs unaccounted for had failed due to:
We are most likely going to opt for polling AWX for the status of jobs for which we don't receive a webhook notification after some period of time
Still, it would be nice if AWX could send notifications just the same for these kinds of failures, rather than have the jobs fail "silently".
- The unified job object does not have the 'send_notification_templates' attribute (I'm not sure why this would sometimes happen)
Are you editing the awx source code? This shouldn't _ever_ be the case:
In [2]: from awx.main.models.notifications import JobNotificationMixin
In [4]: for cls in JobNotificationMixin.__subclasses__():
...: print(cls)
...: print(hasattr(cls, 'send_notification_templates'))
...:
<class 'awx.main.models.jobs.Job'>
True
<class 'awx.main.models.jobs.SystemJob'>
True
<class 'awx.main.models.projects.ProjectUpdate'>
True
<class 'awx.main.models.inventory.InventoryUpdate'>
True
<class 'awx.main.models.ad_hoc_commands.AdHocCommand'>
True
<class 'awx.main.models.workflow.WorkflowJob'>
True
@ijmason,
Are you encountering this on the latest version of awx?
- The job is not marked finished within the 5 times AWX checks:
If _this_ is what you're encountering, 5 seconds seems like a _really_ reasonable delay for a few final events to be inserted into the database. Are you seeing general slowness/performance issues with the speed at which stdout shows up?
Can you share the scale of your job events for a job run where you're encountering this? For example, if you visit /api/v2/jobs/N/job_events/ where N is some finished job that encountered this issue, what does count say?
@ryanpetrello:
The only edits made to the source were to add some additional logging / debug output.
We are still running 2.1.1 and have been busy with other work, so I can't confirm yet whether the latest version still exhibits the problem. We've also implemented polling of the job status in our in-house software for jobs that aren't marked done after a certain amount of time, which has helped to mitigate the issue.
As far as the slowness and performance issues, at times this has been the case -- it's a bit hard to gauge since we are queuing up jobs via the API and harvesting the results the same way, so generally no one is using the UI directly to monitor the jobs until after they have run.
I will see if I can find a recent example to provide the job_events. However, I seem to remember looking at that in the past and noting that for affected jobs, the count was zero.
@ijmason so if you're seeing sustained counts of _zero_ events, that suggests to me that you've got a _notable_ delay in job event processing (if you've got zero events, it sounds like it's not working _at all_ in some cases), which is probably your _actual_ underlying issue.
@ijmason are you still encountering this issue in newer versions of AWX?
@Ryan, unfortunately I haven't had any time to even upgrade at all, let
alone see whether newer versions fix the problem. I am focused on another
project for the near future, but if time permits I would like to try
upgrading and see if the issue persists. I'll try to update the thread if
I get around to this.
On Thu, Jun 13, 2019 at 1:31 PM Ryan Petrello notifications@github.com
wrote:
@ijmason https://github.com/ijmason are you still encountering this
issue in newer versions of AWX?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ansible/awx/issues/2982?email_source=notifications&email_token=AKOT622WFVNGKQHMY67WR6LP2KAAXA5CNFSM4GO6UUR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUOTEA#issuecomment-501803408,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKOT623INU4H22ZE5EUEESDP2KAAXANCNFSM4GO6UURQ
.
--
John Mason
Software Developer - Technical Services
Intelerad Medical Systems Incorporated
800 Boul de Maisonneuve E, 12th floor
Montreal, QC H2L 4L8
Tel: 514.931.6222 x7375
Fax: 514.931.4653
www.intelerad.com
--
This email or any attachments may contain confidential or legally
privileged information intended for the sole use of the addressees. Any
use, redistribution, disclosure, or reproduction of this information,
except as intended, is prohibited. If you received this email in error,
please notify the sender and remove all copies of the message, including
any attachments.
We are using 9.2.0 and experiencing something like this. We have it pinned down to Workflow Jobs and Scheduled Jobs that are not sending notifications when using a Webhook.
I am having a similar issue as @chey, where manual Job run will properly generate a webhook notification, but a Scheduled Job will not. I noticed the issue in 9.3.0, and am still seeing it in 11.2.0. The output below is from the 11.2.0 release.
Each job run should generate two notifications - a "start of job" notification and a "job successful" or "job failed" notification.
I've included the API output of two jobs - the first two entires are for a scheduled job that syncs my VMware inventory with Ansible. Note the curiously blank body: sections.
The second two entries are for the same inventory sync, just executed manually.
{
"id": 3,
"type": "notification",
"url": "/api/v2/notifications/3/",
"related": {
"notification_template": "/api/v2/notification_templates/1/"
},
"summary_fields": {
"notification_template": {
"id": 1,
"name": "MS Teams webhook",
"description": ""
}
},
"created": "2020-05-29T10:55:08.181225Z",
"modified": "2020-05-29T10:55:08.181240Z",
"notification_template": 1,
"error": "",
"status": "successful",
"notifications_sent": 1,
"notification_type": "webhook",
"recipients": "https://outlook.office.com/webhook/[URL]",
"body": ""
},
{
"id": 4,
"type": "notification",
"url": "/api/v2/notifications/4/",
"related": {
"notification_template": "/api/v2/notification_templates/1/"
},
"summary_fields": {
"notification_template": {
"id": 1,
"name": "MS Teams webhook",
"description": ""
}
},
"created": "2020-05-29T10:56:31.948079Z",
"modified": "2020-05-29T10:56:31.948094Z",
"notification_template": 1,
"error": "",
"status": "successful",
"notifications_sent": 1,
"notification_type": "webhook",
"recipients": "https://outlook.office.com/webhook/[URL]",
"body": ""
},
{
"id": 5,
"type": "notification",
"url": "/api/v2/notifications/5/",
"related": {
"notification_template": "/api/v2/notification_templates/1/"
},
"summary_fields": {
"notification_template": {
"id": 1,
"name": "MS Teams webhook",
"description": ""
}
},
"created": "2020-05-29T12:25:32.183674Z",
"modified": "2020-05-29T12:25:32.183689Z",
"notification_template": 1,
"error": "",
"status": "successful",
"notifications_sent": 1,
"notification_type": "webhook",
"recipients": "https://outlook.office.com/webhook/[URL]",
"body": {
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"summary": "Job started: [JOB NAME]",
"title": "Job started: [JOB NAME]",
"themeColor": "FFFFFF",
"sections": [
{
"title": "Details",
"facts": [
{
"name": "Start date",
"value": "2020-05-29T12:25:31.801334Z"
},
{
"name": "Inventory",
"value": "[INVENTORY NAME]"
},
{
"name": "Created by",
"value": "[USER]"
}
]
}
],
"potentialAction": [
{
"@context": "http://schema.org",
"@type": "ViewAction",
"name": "View on Ansible",
"target": [
"[JOB URL]"
]
}
]
}
},
{
"id": 6,
"type": "notification",
"url": "/api/v2/notifications/6/",
"related": {
"notification_template": "/api/v2/notification_templates/1/"
},
"summary_fields": {
"notification_template": {
"id": 1,
"name": "MS Teams webhook",
"description": ""
}
},
"created": "2020-05-29T12:26:55.670728Z",
"modified": "2020-05-29T12:26:55.670742Z",
"notification_template": 1,
"error": "",
"status": "successful",
"notifications_sent": 1,
"notification_type": "webhook",
"recipients": "https://outlook.office.com/webhook/[URL]",
"body": {
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"summary": "Job success: [JOB NAME]",
"title": "Job success: [JOB NAME]",
"themeColor": "008000",
"sections": [
{
"title": "Details",
"facts": [
{
"name": "Start date",
"value": "2020-05-29T12:25:31.801334Z"
},
{
"name": "End date",
"value": "2020-05-29T12:26:54.305861Z"
},
{
"name": "Inventory",
"value": "[INVENTORY NAME]"
},
{
"name": "Created by",
"value": "[USER]"
}
]
}
],
"potentialAction": [
{
"@context": "http://schema.org",
"@type": "ViewAction",
"name": "View on Ansible",
"target": [
"[JOB URL]"
]
}
]
}
}
@bendwyer We are on AWX 10.0 now. A colleague of mine managed to get this working using Jinja templating in the different message body fields in the Notifaction settings. Since some of the variables don't exist in workflow/scheduled jobs you have to use if ... else ... statements to check for their existence before using them. Hope that helps.
Most helpful comment
We are using 9.2.0 and experiencing something like this. We have it pinned down to Workflow Jobs and Scheduled Jobs that are not sending notifications when using a Webhook.