Awx: Containers Not Listening

Created on 24 Aug 2020  路  6Comments  路  Source: ansible/awx

ISSUE TYPE
  • Bug Report
SUMMARY

Tried to upgrade to 10.x and postgres 9 -> 10 failed so I had to do some troubleshooting. Eventually I blew away ~/.awx and the pgdata & awxcompose directories because nothing worked. It didn't like my SECRET_KEY even though it was the same.

Next step was to just blow it all away and build a fresh 14.0.0 instance, needless to say I ran into more issues.

The containers are up and running, awx_web shows that it's binding to all available IP's (0.0.0.0) on the correct ports. Telnet from host machine allows you to connect but then you're instantly disconnected. Telnet from remote gives no connections at all.

ENVIRONMENT
  • AWX version: 14.0.0
  • AWX install method: docker on linux
  • Ansible version: 2.9.12
  • Operating System: Ubuntu 18.04.5 LTS
  • Web Browser: Chrome
STEPS TO REPRODUCE

Not sure, upgrade to 14.0.0 and you might reproduce

EXPECTED RESULTS

I expect at least an insecure connection if not just a normal connection

ACTUAL RESULTS

Containers are up and running, logs show everything is READY, but you can't browse or connect.

ADDITIONAL INFORMATION
Note: I understand that I shouldn't have messed with ~/.awx but there's really no other option besides blowing it away at that point.

Netstat doesn't show IPV4 but I've heard on Ubuntu binding to ipv6 should umbrella both IPv4 and IPv6 (not that I believe it though).

netstat -l ...

tcp6 0 0 :::http
tcp6 0 0 :::https

Logs for awx_web

docker logs --tail 30 awx_web

RESULT 2
OKREADY

Logs for awx_task
The following logs tend to repeat (assuming it's running background tasks)

docker logs --tail 30 awx_task

awx.main.dispatch task ...
awx.main.scheduler Running Tower task manager.
awx.main.scheduler Starting Scheduler
...
awx.main.tasks Starting periodic scheduler
awx.main.tasks Last scheduler run was : ...
RESULT 2
OKREADY
bug

All 6 comments

Hey @bandwiches,

This sounds like some sort of Docker networking issue to me. docker ps -a might provide some clues?

This issue tracker is for tracking feature enhancements and bugs to AWX itself.

If you need help troubleshooting an AWX install, try our mailing list or IRC channel:

http://webchat.freenode.net/?channels=ansible-awx
https://groups.google.com/forum/#!forum/awx-project

This could also be docker networking gone sideways. What happens if you restart the docker daemon?

I'm leaving this here because I hate unresolved issues

Edit - more context, the only services running on this host are docker, ansible, and python. The only containers running are ansible/awx in docker.

@shanemcd restarting the service - no dice. I also rebooted the server twice - no dice.

hey @ryanpetrello appreciate the response and links but freenode is basically a ghost town for the most part. I did get help from one helpful person but they left before we could finish troubleshooting. Would be nice for some due diligence instead of closing this issue without questioning it first.

Normally I would agree with you except for a few things first.

Context
  1. This all came about when upgrading awx to 14.0
  2. Still can't connect on a fresh install
    a. Blew away ~/.awx (includes awxcompose & postgres data directories)
    b. Recreated self-signed certs
    c. Blew away the containers & images
    d. Blew away awxcompose_default network
  3. I can successfully run nginxdemos/hello and connect via the ephemeral port given without issues.
    Note item 3 above
New Log

I've also discovered this one-liner inside the awx_web container:

2020/08/25 19:08:23 [emerg] 314#0: SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)

I missed this error previously because it only happens while awx_task is provisioning, once the RESULT 2/OKREADY status hits, this error no longer repeats, leaving me to think it's working now.

Certificate Details

I can guarantee a few things about the certificate

  1. Private Key is in PEM format
    a. file xx.pem -> xx.pem: PEM RSA private key
  2. Certificate is in PEM format
    b. file yy.pem -> yy.pem: PEM certificate
  3. Certificate inside awx_web /etc/nginx/awxweb.pem is the same as yy.pem
  4. Private key xx.pem is within the container volume /etc/pki/ca-trust/source/anchors
    a. the original certificate is also in this location
NGINX Logs
  • There are no access logs or error logs written to /var/log/nginx/access.log or /var/log/nginx/error.log

    • This tells me that I never make a connection to the host on port 80 or 443 which would be step 1

Testing
  • Telnet gets a connection failed

    • If this were connection refused I would think it's a host problem

    • Because it's failing and not refused, I have to assume at this point that the app itself is what's causing the failure

    • Supplements:



      • docker-proxy is bound to tcp6 :::443 & tcp6 :::80


      • I'm able to run ngxindemos/hello containers and connect just fine



    • Inside awx_web I installed telnet & net-tools

    • netstat -lnp



      • Nothing listening on port 8052 or 8053


      • There are various ESTABLISHED connections to awx_postgres and some other public hosts



Conclusion

I'm pretty convinced this error is caused by the certficiate issue, however I'm not sure why this an issue this time around when the inventory file, key, and certificate never changed.

Fixed.

@ryanpetrello it was the certificate, not sure when it changed or what changed it but it was related to nginx & the cert.

Notably, the last two lines when logging awx_web:

2020-08-27 22:11:30,394 INFO spawned: 'nginx' with pid 116
2020/08/27 22:11:30 [emerg] 116#0: SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
2020-08-27 22:11:30,494 INFO exited: nginx (exit status 1; not expected)
2020-08-27 22:11:30,494 INFO exited: nginx (exit status 1; not expected)
2020-08-27 22:11:31,495 INFO gave up: nginx entered FATAL state, too many start retries too quickly
2020-08-27 22:11:31,495 INFO gave up: nginx entered FATAL state, too many start retries too quickly

I actually found this when I decided to try a fresh install of 9.3.0. The difference being that in 9.3.0 the logs are much cleaner and in 14.0.0/14.1.0 it looks like a monstrous combination of all the logs from every container.

@bandwiches,

Thanks for the follow-up - glad you got it squared away.

Thanks!

Part of this issue was actually related to opensslnot adding the key to the .pem container. It was only adding the certificate. Without the key, NGINX can't verify anything, thus, kaboom. I ended up manually combining the key and cert into a single pem and magick!

Was this page helpful?
0 / 5 - 0 ratings