Awx: Job details and Job view not working

Created on 9 May 2018  路  82Comments  路  Source: ansible/awx

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • UI
SUMMARY

Job details and Job view not working properly

ENVIRONMENT
  • AWX version: 1.0.6.5
  • AWX install method: docker on linux
  • Ansible version: 2.5.2
  • Operating System: RedHat 7.4
  • Web Browser: Firefox/Chrome
STEPS TO REPRODUCE

Run any playbook, failed and succeeded jobs are present but not showing any details.

EXPECTED RESULTS

Details from jobs

ACTUAL RESULTS

Nothing is showing, no errors, no timeouts, just nothing

ADDITIONAL INFORMATION

For example I have a failed job. When clicking on details, I can see the URL changing to:
https://awx-url/#/jobz/project/
However nothing happens. When using right mouse button and opening in new tab/page I will only get the navigation pane and a blank page.
Same happens when I click on the job it self.

Additionaly, adding inventory sources works fine, however when navigating to 'Schedule inventory sync' I can see the the gear-wheel spinning but also nothing happens.
I did a fresh installation today (9th May)

api needs_info bug

Most helpful comment

I am experiencing the same issue.

All 82 comments

I am experiencing the same issue.

What are you using for a proxy in front of AWX? Do you have your awx_web container bound to 0.0.0.0:port or 127.0.0.1:port? I was experiencing the same issue while accessing AWX behind a nginx proxy running on the Linux host and noticed that when the proxy was disabled the Job detail pages would display properly. After I set the awx_web container to listen on 127.0.0.1, I was longer experiencing the issue. To set the awx_web container to 127.0.0.1, you can specify host_port=127.0.0.1:port (instead of host_port=port) in the installer inventory file.

I'm having the same issue where the job details will not display (also running with a proxy in front of awx). Adjusting the awx_web container to listen on 127.0.0.1 did not resolve the issue. Prior to upgrading to 1.0.6.5 this was working properly.

ENVIRONMENT
AWX version: 1.0.6.5
AWX install method: docker on linux
Ansible version: 2.5.2
Operating System: Ubuntu 16.04
Web Browser: Firefox/Chrome

In developer tools I'm seeing this error:
WebSocket connection to 'wss://<>/websocket/' failed: WebSocket is closed before the connection is established.

where the <> is the correct uri to my instance.

"/#/jobs?job_search=page_size:20;order_by:-finished;not__launch_type:sync:1 /#/jobz/inventory/33:1". I am also usning Nginx as a front end proxy (port 443).

Thanks for the tip @anasypany and for trying this solution @cstuart1. I indeed also use nginx as front-end proxy as I need SSL and port 443. What I haven't tried yet is via a ssh-tunnel directly connecting to the awx_web container. If the issue then still persist it is in the application itself. However I will not be able to test this today, but it will be the first thing I will do tomorrow morning.

@cstuart1 Can you paste your ngxinx proxy config? (with censored environment details of course)

@Borrelworst
The solution here is to add in a block for the websocket in your Nginx config

location /websocket {
proxy_pass http://x.x.x.x:80;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}

@anasypany this is probably what you were going to suggest/inquire about?

@cstuart1 I was able to get the job details pages working again with this simple nginx proxy config once awx_web was bound to 127.0.0.1:

location / {
proxy_pass http://127.0.0.1:xxxx; (xxxx = 80 in your case)
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}

If you try this config make sure to add HTTP_X_FORWARDED_FOR in your Remote Host Headers on AWX as well. Let me know if you have any luck!

Yes, that resolved the issue for me.
I had already added HTTP_X_FORWARDED_FOR to AWX as I'm using SAML for auth.

For someone else reading this thread and trying to setup SAML.
I also had to alter /etc/tower/settings.py (task and web) to have the following:
USE_X_FORWARDED_PORT = True
USE_X_FORWARDED_HOST = True

and restart tower after making the setting change.
This is mentioned in the tower documents but I thought I would post this in-case someone else read this thread.

@cstuart1: That indeed solved the issue. I have not set the awx_web to bound explicitly to 127.0.0.1 and apparently that is not needed. The only issue I still see is that when I go to my custom inventory scripts and click on schudule inventory syncs, I will just see the cog wheel, but nothing happens. This is also described in #1850.

I am also experiencing problems with job details. I deployed a stack with postgres, rabbitmq, memcache, awx_web and awx_task in a swarm (ansible role to check variables, create dirs, instantiating a docker-compose template, deploy and so on). I am using vfarcic docker-flow to provide access to all the services in the swarm and to automatically detect changes in the configuration and reflect those changes in the proxy configuration. Within this stack, only awx_web is provided access outside the swarm with the docker-flow stack.
All works well except that the websocket of the job listing and details works only during rare intervals, usually, when repeated killing daphne and nginx inside awx_web container.
Debugging in the browser, I can see a bunch of websocket upgrades being tried and all of them failing with "502 Bad Gateway" after 5/6 seconds. At the same time, for each of the failing websockets attempts, a message like the one bellow appears in the awx_web log:

2018/05/16 23:36:18 [error] 31#0: *543 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: <internal proxy ip>, server: _, request: "GET /websocket/ HTTP/1.1", upstream: "http://127.0.0.1:8051/websocket/", host: "<my specific virtual host>"

Occasionally, the following messages are also printed in the same log:

127.0.0.1:59526 - - [16/May/2018:19:22:54] "WSCONNECTING /websocket/" - -
127.0.0.1:59526 - - [16/May/2018:19:22:54] "WSCONNECT /websocket/" - -
127.0.0.1:59526 - - [16/May/2018:19:22:55] "WSDISCONNECT /websocket/" - -
127.0.0.1:59536 - - [16/May/2018:19:22:55] "WSCONNECTING /websocket/" - -
127.0.0.1:59536 - - [16/May/2018:19:22:55] "WSCONNECT /websocket/" - -
127.0.0.1:59536 - - [16/May/2018:19:22:56] "WSDISCONNECT /websocket/" - -
127.0.0.1:59976 - - [16/May/2018:19:23:06] "WSCONNECTING /websocket/" - -
127.0.0.1:59976 - - [16/May/2018:19:23:06] "WSCONNECT /websocket/" - -
127.0.0.1:59976 - - [16/May/2018:19:23:21] "WSDISCONNECT /websocket/" - -
127.0.0.1:60994 - - [16/May/2018:19:23:27] "WSCONNECTING /websocket/" - -
127.0.0.1:60994 - - [16/May/2018:19:23:27] "WSCONNECT /websocket/" - -
127.0.0.1:60994 - - [16/May/2018:19:25:05] "WSDISCONNECT /websocket/" - -
127.0.0.1:34510 - - [16/May/2018:22:42:34] "WSDISCONNECT /websocket/" - -
127.0.0.1:34710 - - [16/May/2018:22:42:43] "WSCONNECTING /websocket/" - -
127.0.0.1:34710 - - [16/May/2018:22:42:48] "WSDISCONNECT /websocket/" - -
127.0.0.1:34794 - - [16/May/2018:22:42:57] "WSCONNECTING /websocket/" - -
127.0.0.1:34794 - - [16/May/2018:22:43:02] "WSDISCONNECT /websocket/" - -
(...)
127.0.0.1:35964 - - [16/May/2018:23:35:48] "WSDISCONNECT /websocket/" - -
127.0.0.1:37394 - - [16/May/2018:23:35:52] "WSCONNECTING /websocket/" - -
127.0.0.1:37312 - - [16/May/2018:23:35:52] "WSDISCONNECT /websocket/" - -
127.0.0.1:37412 - - [16/May/2018:23:35:57] "WSCONNECTING /websocket/" - -
127.0.0.1:37394 - - [16/May/2018:23:35:57] "WSDISCONNECT /websocket/" - -

The haproxy config generated by docker-flow for this service (awx_web) is:

frontend services
(...)
    acl url_awx-stack_awxweb8052_0 path_beg /
    acl domain_awx-stack_awxweb8052_0 hdr_beg(host) -i <my specific virtual host>
    use_backend awx-stack_awxweb-be8052_0 if url_awx-stack_awxweb8052_0 domain_awx-stack_awxweb8052_0
(...)
backend awx-stack_awxweb-be8052_0
    mode http
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    http-request add-header X-Forwarded-For %[src]
    http-request add-header X-Client-IP %[src]
    http-request add-header Upgrade "websocket"
    http-request add-header Connection "upgrade"
    server awx-stack_awxweb awx-stack_awxweb:8052

It is very similar to a bunch of other services in the swarm.
As far as I can understand, the upstream referenced in the message above refers to daphne inside the awx_web container, that daphne instance is listening on the http://127.0.0.1:8051 and is "called" by the proxy configuration of the nginx, also running inside the same container. I am currently investigating how can one troubleshoot daphne.
I would appreciate if anyone can help me with some ideas or guidelines to proceed with the investigations.
Thanks!

I'm experiencing the same issue

ENVIRONMENT
AWX version: 1.0.6.8
AWX install method: docker on linux
Ansible version: 2.5.2
Operating System: Debian 9
Web Browser: Firefox/Chrome

I have the same issue either

Hi, I had the same issue and i was able to get the jobs output running this command to fix the permissions:

  • chmod 744 -R /opt/awx/embedded

Since most of these comments are related to proxy configurations, I should probably mention that I have the same issue but I do not have a proxy in front of mine.

I'm experiencing the same issue as well. Initially will work fine. I noticed restarting the containers/docker resolves the issue. Will monitor it to determine if issue occurs again, which I assume it will.

same error
i use nginx with configuration similar to @anasypany

location / {
    proxy_pass http://127.0.0.1:8052;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}

but i'm unable to see the job

@cavamagie

ENVIRONMENT

  • AWX version: 1.0.6.15
  • AWX install method: docker on linux
  • Ansible version: 2.5.4
  • Operating System: Debian 9
  • Web Browser: Firefox/Chrome

cat awx/installer/inventory

host_port=127.0.0.1:9999

location / {
    proxy_pass http://127.0.0.1:9999/;
    proxy_http_version 1.1;
    proxy_set_header Host               $host;
    proxy_set_header X-Real-IP          $remote_addr;
    proxy_set_header X-Forwarded-For    $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto  $scheme;
    proxy_set_header Upgrade            $http_upgrade;
    proxy_set_header Connection         "upgrade";
}

It works for me

@cstuart1 Do you think we can chat out of band regarding SAML setup with AWX? I've been at this for hours with no success.

Edit: I commented on #1016 with details on how to configure AWX for use with SAML auth.

Same issues
@SatiricFX I have noticed the same thing: restarting the docker containers usually helps.
Moreover, I am not using any proxy nor https access.

@piroux That does resolve it for us as well temporarily. Haven't found a permanent fix for it. Maybe a bug.

It appears you can swap the supervisor.conf and add verbose output to daphne:

[program:daphne]
command = /var/lib/awx/venv/awx/bin/daphne -b 127.0.0.1 -p 8051 awx.asgi:channel_layer -v 2

With this I am seeing the following behavior related to websockets from Daphne/nginx:

2018-06-27 03:18:59,295 DEBUG    Upgraded connection daphne.response.XbupPxYRcS!BfsxXxiUPF to WebSocket daphne.response.XbupPxYRcS!ReBXomhGtg
RESULT 2
OKREADY
10.255.0.2 - - [27/Jun/2018:03:19:02 +0000] "GET /websocket/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682"
2018-06-27 03:19:03,491 DEBUG    WebSocket closed for daphne.response.XbupPxYRcS!ReBXomhGtg
2018-06-27 03:19:21,372 DEBUG    Upgraded connection daphne.response.XbupPxYRcS!aPmLgJGDZd to WebSocket daphne.response.XbupPxYRcS!hTzJudfDoM
10.255.0.2 - - [27/Jun/2018:03:19:24 +0000] "GET /websocket/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682"
2018-06-27 03:19:25,571 DEBUG    WebSocket closed for daphne.response.XbupPxYRcS!hTzJudfDoM
2018-06-27 03:19:50,862 DEBUG    Upgraded connection daphne.response.XbupPxYRcS!lnvEJzPynj to WebSocket daphne.response.XbupPxYRcS!XCyaFNijYM
10.255.0.2 - - [27/Jun/2018:03:19:53 +0000] "GET /websocket/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682"
2018-06-27 03:19:53,999 DEBUG    WebSocket closed for daphne.response.XbupPxYRcS!XCyaFNijYM
RESULT 2
OKREADY

This eventually logs:

2018-06-27 03:34:03,939 WARNING  dropping connection to peer tcp4:127.0.0.1:34576 with abort=True: WebSocket opening handshake timeout (peer did not finish the opening handshake in time)
10.255.0.2 - - [27/Jun/2018:03:34:03 +0000] "GET /websocket/ HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682"
2018/06/27 03:34:03 [error] 32#0: *147 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.255.0.2, server: _, request: "GET /websocket/ HTTP/1.1", upstream: "http://127.0.0.1:8051/websocket/", host: "localhost:8080"
2018-06-27 03:34:03,941 DEBUG    WebSocket closed for daphne.response.XbupPxYRcS!gbrIRtuqeq

awx_web:1.0.6.23 here:

10.255.0.2 - - [28/Jun/2018:13:31:14 +0000] "GET /websocket/ HTTP/1.1" 502 575 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36"
2018/06/28 13:31:14 [error] 25#0: *440 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.255.0.2, server: _, request: "GET /websocket/ HTTP/1.1", upstream: "http://127.0.0.1:8051/websocket/", host: "awx.prmrgt.com:80"
10.255.0.2 - - [28/Jun/2018:13:31:19 +0000] "GET /websocket/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36"

etc. websocket simply not working. All reverse proxy configuration was working before (1.0.3.29 for example). nginx config is fine:

      location / {
        proxy_pass http://10.20.1.100:8053/;
        proxy_http_version 1.1;
        proxy_set_header   Host               $host:$server_port;
        proxy_set_header   X-Real-IP          $remote_addr;
        proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto  $scheme;
        proxy_set_header   Upgrade            $http_upgrade;
        proxy_set_header   Connection         "upgrade";
      }

I appended these lines to /etc/tower/settings.py:

USE_X_FORWARDED_PORT = True
USE_X_FORWARDED_HOST = True

I found ansible/awx_web:1.0.6.11 is the latest image working fine for me (this means the websocket reverse proxy settings are fine outside the awx_web!). I hope this helps.

Please not the settings.py changes are not needed for 1.0.6.11 to work. I don't see any impact it I set those or not.

I am also facing the same issue.

ENVIRONMENT

  • AWX version: 1.0.6.11
  • AWX install method: docker on linux
  • Ansible version: 2.5.7
  • Operating System: CentOS 7
  • Web Browser: Firefox/Chrome

They only workaround that is currently working for me is stopping everything and starting again the containers.

This issue does not appear to occur for a little while after redeploying AWX.

I did however notice that none of the job details from while this issue is occuring are available even after you restart. It appears as though the "stdout" response on the API is populated via the task container posting data to a websocket for that job.

I also noticed that when the issue is occurring that the task container fails with the following errors:

[2018-07-02 19:03:47,717: DEBUG/Worker-4] using channel_id: 2
2018-07-02 19:03:47,718 ERROR    awx.main.models.unified_jobs job 15 (running) failed to emit channel msg about status change
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/awx/main/models/unified_jobs.py", line 1169, in _websocket_emit_status
    emit_channel_notification('jobs-status_changed', status_data)
  File "/usr/lib/python2.7/site-packages/awx/main/consumers.py", line 70, in emit_channel_notification
    Group(group).send({"text": json.dumps(payload, cls=DjangoJSONEncoder)})
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/channels/channel.py", line 88, in send
    self.channel_layer.send_group(self.name, content)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 190, in send_group
    self.send(channel, message)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 95, in send
    self.recover()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py", line 77, in recover
    self.tdata.consumer.revive(self.tdata.connection.channel())
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/connection.py", line 255, in channel
    chan = self.transport.create_channel(self.connection)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 92, in create_channel
    return connection.channel()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/connection.py", line 282, in channel
    return self.Channel(self, channel_id)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py", line 101, in __init__
    self._x_open()
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py", line 427, in _x_open
    self._send_method((20, 10), args)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
    self.channel_id, method_sig, args, content,
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
    write_frame(1, channel, payload)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
    frame_type, channel, size, payload, 0xce,
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer

This would explain why the job details from jobs that ran while the websockets are not working arent even visible after restarting the web/task container and why they arent available when hitting the stdout resource on the job endpoint

I ran into this issue as well and resolved it by stopping both the web and task containers and rerunning the installer playbook to start them again.

we have the issue with 1.0.6.0 and not recovering after deleting/recreating pods for awx and etcd

restarting web/task on one dev host where i was testing directly fixed it.

In Prod i'm facing issues with websocket errors behind custom reverse-proxies - Is it possible via some header hack to disable websocket completely or is that a hard requirement for awx - some libraries have fallback options ?

Decided to take a look at the rabbitmq logs and when websockets stops working I start seeing the following in the logs:

2018-07-07 00:56:02.000 [warning] <0.5148.0> closing AMQP connection <0.5148.0> (10.0.0.6:54140 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.001 [warning] <0.5138.0> closing AMQP connection <0.5138.0> (10.0.0.6:54138 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.001 [warning] <0.4690.0> closing AMQP connection <0.4690.0> (10.0.0.6:53950 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.055 [warning] <0.5182.0> closing AMQP connection <0.5182.0> (10.0.0.6:54150 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.056 [warning] <0.5172.0> closing AMQP connection <0.5172.0> (10.0.0.6:54148 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.057 [warning] <0.4731.0> closing AMQP connection <0.4731.0> (10.0.0.6:53974 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-07 00:56:02.058 [warning] <0.5192.0> closing AMQP connection <0.5192.0> (10.0.0.6:54198 -> 10.0.0.12:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection

We're getting the following error everytime we try to click on a job, both running and ones that have already been completed.

WebSocket connection to 'wss://{redacted}/websocket/' failed: WebSocket is closed before the connection is established.

We experienced this both on the latest AWX Web version and on several older revisions. ansible/awx_web:1.0.6.11 in particular was what we tried.

It's worth noting this container sits behind a reverse nginx proxy, but we've tried narrowing this down by removing the proxy all together and still are getting the same errors/issue. We use this very heavily in production, are there any short-term fixes? Container reboots sometimes work for a few minutes, but typically fall back to the same errors.

Logs on AWX Web don't show anything overly useful, and likewise with postgres and task containers. RabbitMQ does show similar results as stated above.

2018-07-09 12:29:48.398 [warning] <0.11522.5> closing AMQP connection <0.11522.5> (10.0.5.240:40382 -> 10.0.5.234:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-09 12:29:48.398 [warning] <0.17632.5> closing AMQP connection <0.17632.5> (10.0.5.240:46896 -> 10.0.5.234:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection
2018-07-09 12:29:48.399 [warning] <0.23641.5> closing AMQP connection <0.23641.5> (10.0.5.240:53386 -> 10.0.5.234:5672, vhost: 'awx', user: 'guest'):
client unexpectedly closed TCP connection

Seeing this as well with AWX 1.0.6.25 and Asnible 2.6.1.

EDIT: 1.0.6.1 also seems to not work.

Any page requested like this never completely loads and is blank: https://awx/jobs/playbook/8

Playbooks do actually run (and sometimes fail) which works fine for notifications.

Same behavior, but not seeing any of the errors others. Also, restarting the pod doesn't fix the issue for any amount of time. It looks like I'm just being sent back to the jobs list page.


10.32.5.17 - - [12/Jul/2018:15:50:50 +0000] "PROXY TCP4 10.32.44.94 10.32.44.94 41275 32132" 400 173 "-" "-"
[pid: 37|app: 0|req: 77/525] 10.244.8.0 () {48 vars in 3205 bytes} [Thu Jul 12 15:50:51 2018] GET /api/v2/inventory_updates/9/ => generated 4586 bytes in 104 msecs (HTTP/1.1 200) 8 headers in 248 bytes (1 switches on core 0)
10.244.8.0 - - [12/Jul/2018:15:50:51 +0000] "GET /api/v2/inventory_updates/9/ HTTP/1.1" 200 4586 "https://awx/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
10.244.6.0 - - [12/Jul/2018:15:50:51 +0000] "OPTIONS /api/v2/inventory_updates/9/ HTTP/1.1" 200 11892 "https://awx/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
[pid: 33|app: 0|req: 238/526] 10.244.6.0 () {50 vars in 3249 bytes} [Thu Jul 12 15:50:51 2018] OPTIONS /api/v2/inventory_updates/9/ => generated 11892 bytes in 149 msecs (HTTP/1.1 200) 8 headers in 249 bytes (1 switches on core 0)
10.244.10.0 - - [12/Jul/2018:15:50:51 +0000] "GET /api/v2/inventory_updates/9/events/?order_by=start_line&page=1&page_size=50 HTTP/1.1" 200 17126 "https://awx/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
[pid: 36|app: 0|req: 123/527] 10.244.10.0 () {48 vars in 3299 bytes} [Thu Jul 12 15:50:51 2018] GET /api/v2/inventory_updates/9/events/?order_by=start_line&page=1&page_size=50 => generated 17126 bytes in 90 msecs (HTTP/1.1 200) 9 headers in 264 bytes (1 switches on core 0)

AWX 1.0.6.17 Ansible 2.5.5 running on Kubernetes

@Borrelworst
Hey friend, would you be able to paste your entire nginx.conf file? I am having the exact same issue but adding the stanza above did not fix my issue.

This is mine fwiw.
`#user awx;

    worker_processes  1;

    pid        /tmp/nginx.pid;

    events {
        worker_connections  1024;
    }

    http {
        include       /etc/nginx/mime.types;
        default_type  application/octet-stream;

        log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" "$http_x_forwarded_for"';

        map $http_upgrade $connection_upgrade {
            default upgrade;
            ''      close;
        }

        sendfile        on;
        #tcp_nopush     on;
        #gzip  on;

        upstream uwsgi {
            server 127.0.0.1:8050;
            }

        upstream daphne {
            server 127.0.0.1:8051;
        }

        server {
            listen 8052 default_server;

            # If you have a domain name, this is where to add it
            server_name _;
            keepalive_timeout 65;

            # HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
            add_header Strict-Transport-Security max-age=15768000;

            location /nginx_status {
              stub_status on;
              access_log off;
              allow 127.0.0.1;
              deny all;
            }

            location /static/ {
                alias /var/lib/awx/public/static/;
            }

            location /favicon.ico { alias /var/lib/awx/public/static/favicon.ico; }

            location ~ ^/(websocket|network_ui/topology/) {
                # Pass request to the upstream alias
                proxy_pass http://daphne;
                # Require http version 1.1 to allow for upgrade requests
                proxy_http_version 1.1;
                # We want proxy_buffering off for proxying to websockets.
                proxy_buffering off;
                # http://en.wikipedia.org/wiki/X-Forwarded-For
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                # enable this if you use HTTPS:
                proxy_set_header X-Forwarded-Proto https;
                # pass the Host: header from the client for the sake of redirects
                proxy_set_header Host $http_host;
                # We've set the Host header, so we don't need Nginx to muddle
                # about with redirects
                proxy_redirect off;
                # Depending on the request value, set the Upgrade and
                # connection headers
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection $connection_upgrade;
            }

            location / {
                # Add trailing / if missing
                rewrite ^(.*)$http_host(.*[^/])$ $1$http_host$2/ permanent;
                uwsgi_read_timeout 120s;
                uwsgi_pass uwsgi;
                include /etc/nginx/uwsgi_params;
            }
        }
    }`

PSA: If anyone here is using Docker SWARM and having these issues, try to run the same stack just using docker-compose (non-swarm v2), and see if you have the same issues.

The issues in this thread were all symptoms we were seeing whilst running in Swarm mode. Once we switched to local instances (docker-compose), we haven't had any issues running AWX behind an Nginx Proxy (specifically Jwilder's with custom SSL Certificates).

Just wanted to toss this tidbit out there. RedHat/AWX team has specifically stated AWX is NOT swarm supported, but I know it makes sense for a lot of people to use Swarm.

@anthonyloukinas, I'm not in swam and using docker-compose and it doesn't display job status properly at all.

@hitmenow Here bellow is my server block, I left the original congifuration intact, but just created a conf file in conf.d:

   server {
   ssl   on;

   listen       443 ssl default_server;
   server_name <servername>;
   ssl_certificate <certfile>;
   ssl_certificate_key <keyfile>;
   proxy_set_header    X-Forwarded-For    $remote_addr;
   include /etc/nginx/default.d/*.conf;

   location / {
        proxy_pass http://localhost:80/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
   }

   error_page 404 /404.html;
       location = /40x.html {
   }

   error_page 500 502 503 504 /50x.html;
       location = /50x.html {
   }

  }

It does work for me most of the time, but occasionally I have to restart docker to fix the issue again. The fact that so many people have the same issue tells me that or the documentation is not sufficient, or there is really a bug in the software causing the issue.

@anthonyloukinas I'm not sure RedHat provides any support for AWX so it not being supported by RedHat isn't a huge deal -- we are just hoping for some help from the team to figure out what is causing this in the scenarios it's occurring in (with and without swarm) so we can contribute an open-source fix -- nobody seems to be providing any guidance or insight, which is understandable, but in my opinion we should keep collecting more information here.

What I've noticed is once websockets stop working, subsequent attempts at the websocket opening handshake never complete. Running tcpdump on the web container on port 8051 shows web never sends out the accept-upgrade response.

I've traced the websocket connect request path and it's kind of messy. A websocket request gets handled by web but web defers responding to the handshake. Instead what happens is web creates a message on rabbitmq that a websocket connect was received. Task then picks up this message, puts a message back on rabbitmq with the contents {"accept": True}, and once web receives this message it sends out the handshake response to the client, successfully establishing a websocket connection.

What seems to be happening is that, at some point, there is a mismatch between the channels where web and task look for and place their messages (i.e. web listens for accept messages on channel A but task is sending those messages on channel B). Restarting the supervisor deamons on web and task at the same time (and other workarounds) seem to fix the issue but only temporarily. I'm also not sure why web isn't handling the websocket handshake response itself.

Full disclosure, I've only been running into these problems when deploying AWX in a swarm environment where each container has no replicas. It looks like something about swarm is causing the channels used for communication b/t web and task to de-synchronize.

Thank you @Borrelworst! I have a different scenario than you I think. I have a load balancer in front of my containers which has SSL termination. And my nginx server is listening on 8052. Will do some more troubleshooting. Thanks again

I resolved when set the endpoint_mode of RabbitMQ to dnsrr in the Docker Swarm Mode.
The rabbitmq stack in compsoe file is:

  rabbitmq:
    image: rabbitmq:3
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      endpoint_mode: dnsrr
    environment:
      RABBITMQ_DEFAULT_VHOST: "awx"
    networks:
      - webnet

Switching to dnsrr instead of VIP kind of implies that it's an issue with the VIP timing out the idle connection --

https://github.com/moby/moby/issues/37466#issuecomment-405307656
https://success.docker.com/article/ipvs-connection-timeout-issue

This would match with the described behavior where it works initially and then at some undefined later time (relatively quickly) it stops working.

@sightseeker Is there an equivalent that you know of for Kubernetes deployments?

Thankyou @strawgate !
When I set tcp_keepalive_timeout to less than 900 secs and using vip mode, the problem no longer occurs.

@hitmenow I haven't tried yet with K8s.

It would also imply that switching the containers to using tasks.rabbitmq to hit rabbitmq would fix the issue as that bypasses the VIP too. Will test and report back

@hitmenow Kubernetes doesnt use VIP or swarm networking so dnsrr is probably not related to your issue.

I'm running AWX in pure docker containers on the same machine (no swarm or k8s) and I was hitting this issue too.

Setting net.ipv4.tcp_keepalive_time=600 helped me as well, but it needs to be set before daphne runs, so it should be put into /etc/sysctl.conf on the host system or similar.

I just updated the tcp_keepalive in my staging and production environment. I will check if this solution helps to the issue.

I have the same issue either

ENVIRONMENT

AWX version: 1.0.7
AWX install method: docker on linux
Ansible version: 2.5.4
Operating System: CentOS 7
Web Browser: Firefox/Chrome

I have this issue as well. I was on 1.0.4.50 and that was working fine. I've moved up to 1.0.7.0 and now I just see a spinning 'working' wheel when try to see job history. I've tried different browsers and incognito windows but no change.

I'm running AWX just on normal docker. Not on k8s or openshift.

I was using haproxy in front for SSL offload but I still see the same if I browse to the awx_web container on its exposed web port (8052)

grahamneville - do you have any container logs we can take a look at?

@jakemcdermott

I've tried a few things, listed below, that people have suggested fixed the issue and some more but I've had no luck.

  • Hitting AWX_WEB directly and not using any proxy in front
  • Multiple Browsers, clearing cache and incognito windows
  • Deleting all containers and removing the postgres database storage and doing a fresh install
  • Setting host_port=127.0.0.1:port in the inventory file for exposing the port in awx_web
  • Changed /etc/tower/settings to have 'USE_X_FORWARDED_PORT = TrueandUSE_X_FORWARDED_HOST = True` which I baked in to a new build
  • Changed net.ipv4.tcp_keepalive_time to net.ipv4.tcp_keepalive_time=600 and restarted the docker service on the host and restarted all containers
  • chmod 744 -R /opt/awx/embedded - /opt/awx/embedded doesn't exist on the containers
  • Reverted commit 2d4fbffb919884a8f9fb6ba690756cefd61929c7

These are the logs I see from the awx_web container, I'm not seeing anything coming through at the same time on any of the other containers.

[pid: 138|app: 0|req: 29/440] 1.1.1.1 () {50 vars in 2485 bytes} [Fri Aug 17 08:16:22 2018] OPTIONS /api/v2/jobs/744/ => generated 12949 bytes in 216 msecs (HTTP/1.1 200) 10 headers in 387 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:22 +0000] "OPTIONS /api/v2/jobs/744/ HTTP/1.1" 200 12949 "https://ourawxhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36" "2.2.2.2"
[pid: 136|app: 0|req: 258/441] 1.1.1.1 () {48 vars in 2447 bytes} [Fri Aug 17 08:16:22 2018] GET /api/v2/jobs/744/ => generated 9971 bytes in 237 msecs (HTTP/1.1 200) 10 headers in 386 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:22 +0000] "GET /api/v2/jobs/744/ HTTP/1.1" 200 9971 "https://ourawxhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36" "2.2.2.2"
1.1.1.1 - - [17/Aug/2018:08:16:22 +0000] "GET /api/v2/jobs/744/job_events/?order_by=-counter&page=1&page_size=50 HTTP/1.1" 200 62930 "https://ourawxhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36" "2.2.2.2"
[pid: 135|app: 0|req: 29/442] 1.1.1.1 () {48 vars in 2544 bytes} [Fri Aug 17 08:16:22 2018] GET /api/v2/jobs/744/job_events/?order_by=-counter&page=1&page_size=50 => generated 62930 bytes in 415 msecs (HTTP/1.1 200) 11 headers in 402 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:22 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 259/443] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:22 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:24 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 260/444] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:24 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:26 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 261/445] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:26 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:28 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 262/446] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:28 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:30 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 137|app: 0|req: 84/447] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:30 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:32 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 263/448] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:32 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)
1.1.1.1 - - [17/Aug/2018:08:16:34 +0000] "HEAD / HTTP/1.1" 200 0 "-" "-" "-"
[pid: 136|app: 0|req: 264/449] 1.1.1.1 () {28 vars in 291 bytes} [Fri Aug 17 08:16:34 2018] HEAD / => generated 11339 bytes in 24 msecs (HTTP/1.1 200) 5 headers in 161 bytes (1 switches on core 0)

It's just the job details/history view that's a problem and the fact you don't get to see the job running in real time when you launch a new job, every other page loads fine.
This is one of the URLs that I'm trying to get to, as seen when clicking on the job in the jobs view:
https://ourawxhost/#/jobs/playbook/750?job_search=page_size%3A20%3Border_by%3A-finished%3Bnot__launch_type%3Async

Any suggestions on what can be done to troubleshoot this further please?

Also having this problem in k8s. Tried a few things listed here, but still will randomly get closed sockets even when directly connected to the web container. If there are any debugging things to run, I can do so if needed.

I'm unclear on what might be causing the closed sockets SamKirsch mentioned, but that sounds like a deeper, different issue and one not entirely constrained to the job details page?

There are some race conditions involving setting up the initial connection to the job details page that have been resolved downstream and will be landing in AWX shortly.

These changes _might_ resolve some of the issues mentioned by others above - one way to know if they will help is if you're currently still able to see dynamic updates to socket-driven content other than the incoming output lines (status icons, elapsed times, project updates, etc.).

If _nothing_ is updating dynamically anywhere on the app during job runs then this points to a potentially deeper configuration issue. If this is the case for you it might be worth opening a separate github issue (or visiting our IRC channel) to help in tracking your specific problem down, as there are many different potential underlying causes for socket connectivity issues.

The closed sockets I am talking about are all in this thread. Closed websockets. I notice closed websockets after an unspecified time (it's not always the same) when I try to view job details and also jobs that are running / have run. This does not mean it never shows, sometimes a full container restart lets everything show again. I hope the upcoming upstream changed will help :)

So I've found the reason for my issues and why I couldn't see the job details. It was down to the chrome version I had installed.

61.0.3163.79 caused issues where the 'working' wheel was just spinning.
Upgrading to 67.0.3396.99 fixed these issues and I can now see the job details.

@grahamn-gr
Thanks for your answer, I updated my chrome to newest version and the problem solved!

It sounds like a number of people are having better luck with a newer version of Chrome, though from the variety of comments, it feels like this ticket has become a catch-all for any sort of odd bug related to the job details page.

I'm going to go ahead and close this; if anybody continues to encounter issues in 1.0.7, please let us know by filing a new issue with details.

@ryanpetrello jfyi still facing this issue, version 1.0.7.2

@boris-42 can you provide the environment details from https://github.com/ansible/awx/issues/new?template=bug_report.md, including web browser version?

@ryanpetrello

  • We are using official image 1.0.7.2
  • Web browser is not the problem (we tried on different, on different OS)

Some observation:

  • if we curl this "api/v2/jobs//stdout/" it's empty
  • After restart of awx web and awx task it gets populated
  • In logs of awx task we see " File "/usr/lib/python2.7/site-packages/awx/main/models/unified_jobs.py", line 1169, in _websocket_emit_status" the same as in one of aboves comments
  • After restarting it works for ~15 minutes
  • Seems like problem between awx-task and rabbitmq...

It sounds to me like job events aren't being saved into the database. This can be caused by a number of things. Do you see anything when you visit /api/v2/jobs/N/event/?

@ryanpetrello I suspect you meant jobs_events.

it returns

{
  "count": 0, 
  "next": null, 
  "previous": null, 
  "results": []
}

If I restart awx-task and awx-web this information gets populated. And it continues working until we see in awx-task that log message related to rabbitmq

Yep, that's exactly what I meant, thanks :)

In your awx task container, can you run:

supervisorctl -c /supervisor_task.conf status

@ryanpetrello

bash-4.2$ supervisorctl -c /supervisor_task.conf status
awx-config-watcher                  RUNNING   pid 195, uptime 12:38:18
tower-processes:callback-receiver   RUNNING   pid 199, uptime 12:38:18
tower-processes:celery              RUNNING   pid 196, uptime 12:38:18
tower-processes:celery-watcher      RUNNING   pid 198, uptime 12:38:18
tower-processes:channels-worker     RUNNING   pid 197, uptime 12:38:18

@ryanpetrello

Some more information:

  • If I create schedule and run jobs every 3-5 minutes it works perfectly
  • If I create schedule and run jobs with gap of 20 minutes it stops working

@ryanpetrello Some more details. Bug is reproduced on many version of AWX.

If i run /usr/bin/awx-manage run_callback_receiver in task container

All results get send to database...

More interesting thing is this piece of code:
https://github.com/ansible/awx/blob/devel/awx/main/management/commands/run_callback_receiver.py#L233-L238

If something happens to rabbitmq and we got broken connection it's not recrated, from other side we have large try/except in code that uses connection, which doesn't let run_callback_reciever crash so supervisor will be bring it back...

@boris-42 the example you linked is catching KeyboardInterrupt - I'd expect the callback receiver to gracefully handle and recover from AMQP unavailability in the way you described (testing this a bit myself).

I'm having a hard time reproducing this by stopping RabbitMQ - the callback receiver recovers for me after stopping and starting the message broker:

image

It also seems resilient to me screwing w/ TCP via tcpkill:

image

@boris-42 do you see any logs in the task container for the callback receiver that might provide some hints?

IMHO, I don't know why this issue is closed when is still happening, even with the recent versions.

@josemgom the reason it's closed is that the original reporter described their issue and found a solution to it here: https://github.com/ansible/awx/issues/1861#issuecomment-388286258

(also, see: https://github.com/ansible/awx/issues/1861#issuecomment-415033350)

The number of people chiming in on this one has generated a lot of noise; it's likely people are encountering a _number_ of issues across a variety of configurations that are being conflated:

  • some people are using older awx versions with resolved bugs
  • some are deploying behind a proxy and needed additional X-Forwarded-For configuration
  • some have reported that things work better with a newer version of Chrome

If you're still encountering an issue with the job details page, and you're using the most recent version of awx, _and_ none of the suggestions in this comment thread have addressed it for you, then please open a new issue with as much detail as possible about the problem you're encountering: https://github.com/ansible/awx/issues/new?template=bug_report.md

In the meantime, I and other awx maintainers are happy to help as much as possible here (see my and others' various interactions with people above) and in our IRC room on freenode (#awx-devel).

@ryanpetrello you are back ! =)

Steps to reproduce:

  • My production deployment is running on top of k8s and looks like, this:
    -- awx-rabbitmq is statefulset with 3 replicas
    -- memcahced and postgres are 2 deployments
    -- awx-web is coupled with awx-task in the same pod as part of one deployment (there is some bug that we are still debugging that is blocking us from decoupling)
  • After deploying everything, don't touch anything for 15+ minutes
  • Run any job template (demo one for example)
  • You won't see the logs in output
  • If you restart callback receiver logs are populated
  • (if you don't run anything for next 15 minutes issue is going to be reproduced)

Hey @boris-42,

Do you see any logs in the task container for the callback receiver that might provide some hints? Errors/exceptions/tracebacks?

@boris-42 @strawgate @DBLaci @nmpacheco and others who have encountered the Connection reset by peer errors: we _think_ we might have an idea of what's causing this issue. If any of you are feeling like experimenting, could you give this PR a try in your environments to see if it improves things?

https://github.com/ansible/awx/pull/2391

Alternatively, you could try running something like this (in all of your containers) and _then_ restarting awx services to get the latest version:

~ /var/lib/awx/venv/awx/bin/pip uninstall asgi-amqp
~ /var/lib/awx/venv/awx/bin/pip install "asgi-amqp==1.1.2"

@ryanpetrello Thanks, I'll try to patch container this weekend!

Thanks @ryanpetrello

I just upgraded the package in my development and production envs. I let you know if the users still facing this issue.

Running:
/var/lib/awx/venv/awx/bin/pip install -U asgi-amqp==1.1.2 brought in a newer version of kombu 4.2.1 which starts breaking daphne/celery badly.

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/bin/daphne", line 11, in <module>
    sys.exit(CommandLineInterface.entrypoint())
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/daphne/cli.py", line 144, in entrypoint
    cls().run(sys.argv[1:])
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/daphne/cli.py", line 174, in run
    channel_layer = importlib.import_module(module_path)
  File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/usr/lib/python2.7/site-packages/awx/asgi.py", line 9, in <module>
    prepare_env() # NOQA
  File "/usr/lib/python2.7/site-packages/awx/__init__.py", line 55, in prepare_env
    if not settings.DEBUG: # pragma: no cover
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/django/conf/__init__.py", line 56, in __getattr__
    self._setup(name)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/django/conf/__init__.py", line 41, in _setup
    self._wrapped = Settings(settings_module)
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/django/conf/__init__.py", line 110, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/usr/lib/python2.7/site-packages/awx/settings/production.py", line 17, in <module>
    from defaults import *  # NOQA
  File "/usr/lib/python2.7/site-packages/awx/settings/defaults.py", line 7, in <module>
    import djcelery
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/djcelery/__init__.py", line 34, in <module>
    from celery import current_app as celery  # noqa
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/five.py", line 312, in __getattr__
    module = __import__(self._object_origins[name], None, None, [name])
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/_state.py", line 20, in <module>
    from celery.utils.threads import LocalStack
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/utils/__init__.py", line 405, in <module>
    from .functional import chunks, noop                    # noqa
  File "/var/lib/awx/venv/awx/lib/python2.7/site-packages/celery/utils/functional.py", line 19, in <module>
    from kombu.utils.compat import OrderedDict
ImportError: cannot import name OrderedDict

Running:
/var/lib/awx/venv/awx/bin/pip install -U asgi-amqp==1.1.2 kombu==3.0.37 and holding back kombu appears to have worked. No more Connection reset by peer errors and the job details load!

ENVIRONMENT

AWX version: 2.0.0
AWX install method: docker on linux
Ansible version: 2.6.5
Operating System: Ubuntu 18.04
Web Browser: Firefox/Chrome

@taspotts thanks for the feedback. We've merged the asgi_amqp update and are planning to release it in a new version of awx in the near future.

@boris-42 @strawgate @DBLaci @nmpacheco and others who have encountered the Connection reset by peer errors: we've released a new version of awx, 2.0.1, which we believe should resolve this issue. Please give it a shot and let us know if you continue to encounter issues!

I also had this error and verified that it was fixed in the latest released docker-image.
Thanks for addressing this issue!

Closing this, please reopen if it persists.

@ryanpetrello thanks for fixing this, I checked it finally yesterday, everything works.

Was this page helpful?
0 / 5 - 0 ratings