Tried to upgrade to 10.x and postgres 9 -> 10 failed so I had to do some troubleshooting. Eventually I blew away ~/.awx and the pgdata & awxcompose directories because nothing worked. It didn't like my SECRET_KEY even though it was the same.
Next step was to just blow it all away and build a fresh 14.0.0 instance, needless to say I ran into more issues.
The containers are up and running, awx_web shows that it's binding to all available IP's (0.0.0.0) on the correct ports. Telnet from host machine allows you to connect but then you're instantly disconnected. Telnet from remote gives no connections at all.
Not sure, upgrade to 14.0.0 and you might reproduce
I expect at least an insecure connection if not just a normal connection
Containers are up and running, logs show everything is READY, but you can't browse or connect.
Netstat doesn't show IPV4 but I've heard on Ubuntu binding to ipv6 should umbrella both IPv4 and IPv6 (not that I believe it though).
netstat -l ...
tcp6 0 0 :::http
tcp6 0 0 :::https
Logs for awx_web
docker logs --tail 30 awx_web
RESULT 2
OKREADY
Logs for awx_task
The following logs tend to repeat (assuming it's running background tasks)
docker logs --tail 30 awx_task
awx.main.dispatch task ...
awx.main.scheduler Running Tower task manager.
awx.main.scheduler Starting Scheduler
...
awx.main.tasks Starting periodic scheduler
awx.main.tasks Last scheduler run was : ...
RESULT 2
OKREADY
Hey @bandwiches,
This sounds like some sort of Docker networking issue to me. docker ps -a might provide some clues?
This issue tracker is for tracking feature enhancements and bugs to AWX itself.
If you need help troubleshooting an AWX install, try our mailing list or IRC channel:
http://webchat.freenode.net/?channels=ansible-awx
https://groups.google.com/forum/#!forum/awx-project
This could also be docker networking gone sideways. What happens if you restart the docker daemon?
I'm leaving this here because I hate unresolved issues
Edit - more context, the only services running on this host are docker, ansible, and python. The only containers running are ansible/awx in docker.
@shanemcd restarting the service - no dice. I also rebooted the server twice - no dice.
hey @ryanpetrello appreciate the response and links but freenode is basically a ghost town for the most part. I did get help from one helpful person but they left before we could finish troubleshooting. Would be nice for some due diligence instead of closing this issue without questioning it first.
Normally I would agree with you except for a few things first.
~/.awx (includes awxcompose & postgres data directories)awxcompose_default networknginxdemos/hello and connect via the ephemeral port given without issues.I've also discovered this one-liner inside the awx_web container:
2020/08/25 19:08:23 [emerg] 314#0: SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
I missed this error previously because it only happens while awx_task is provisioning, once the RESULT 2/OKREADY status hits, this error no longer repeats, leaving me to think it's working now.
I can guarantee a few things about the certificate
file xx.pem -> xx.pem: PEM RSA private keyfile yy.pem -> yy.pem: PEM certificate/etc/nginx/awxweb.pem is the same as yy.pemxx.pem is within the container volume /etc/pki/ca-trust/source/anchors/var/log/nginx/access.log or /var/log/nginx/error.logconnection refused I would think it's a host problemtcp6 :::443 & tcp6 :::80ngxindemos/hello containers and connect just fineawx_web I installed telnet & net-toolsnetstat -lnpawx_postgres and some other public hostsI'm pretty convinced this error is caused by the certficiate issue, however I'm not sure why this an issue this time around when the inventory file, key, and certificate never changed.
Fixed.
@ryanpetrello it was the certificate, not sure when it changed or what changed it but it was related to nginx & the cert.
Notably, the last two lines when logging awx_web:
2020-08-27 22:11:30,394 INFO spawned: 'nginx' with pid 116
2020/08/27 22:11:30 [emerg] 116#0: SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/nginx/awxweb.pem") failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib)
2020-08-27 22:11:30,494 INFO exited: nginx (exit status 1; not expected)
2020-08-27 22:11:30,494 INFO exited: nginx (exit status 1; not expected)
2020-08-27 22:11:31,495 INFO gave up: nginx entered FATAL state, too many start retries too quickly
2020-08-27 22:11:31,495 INFO gave up: nginx entered FATAL state, too many start retries too quickly
I actually found this when I decided to try a fresh install of 9.3.0. The difference being that in 9.3.0 the logs are much cleaner and in 14.0.0/14.1.0 it looks like a monstrous combination of all the logs from every container.
@bandwiches,
Thanks for the follow-up - glad you got it squared away.
Thanks!
Part of this issue was actually related to opensslnot adding the key to the .pem container. It was only adding the certificate. Without the key, NGINX can't verify anything, thus, kaboom. I ended up manually combining the key and cert into a single pem and magick!