Awx: On Docker for Mac, playbook runs fail with 'uid not found: 1000'

Created on 9 Sep 2017  路  16Comments  路  Source: ansible/awx

Summary

I'm still trying to see exactly what's going wrong, but basically on my Mac, when I run the awx_web and awx_task containers as built by the installer playbook, something with the USER Docker uses seems to be confusing AWX, because all playbook runs fail immediately (on their first task) with KeyError: 'getpwuid(): uid not found: 1000'.

Environment

  • AWX version: 1.0.0-[something]
  • Ansible version: 2.3.1
  • Operating System: macOS 10.12
  • Web Browser: Safari

Steps To Reproduce:

curl -O https://raw.githubusercontent.com/geerlingguy/awx-container/master/docker-compose.yml
docker-compose up -d

(The two containers referenced by that docker-compose.yml file are built by this awx-container project, which reuses the Dockerfiles and configuration from the AWX installer playbook.)

Expected Results:

Playbooks should run (including the playbook that pulls the latest version of a project repository).

Actual Results:

Every playbook run (e.g. the demo project git pull playbook) fails on the first task with the error as displayed below:

PLAY [all] *********************************************************************

TASK [delete project directory before update] **********************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'getpwuid(): uid not found: 1000'
fatal: [localhost]: FAILED! => {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}

Additional Information:

If I run whoami, I get cannot find name for user ID 1000:

$ docker exec e8662 whoami
whoami: cannot find name for user ID 1000
installer high bug

Most helpful comment

I think a really simple PR that lowers the uid by which we perform the passwd file rewrite, and some light testing on the docker containers would totally solve this issue.

I can do it in a few days if no one gets around to it.

All 16 comments

I'm going to run the same Docker Compose file inside a CentOS 7 VM to see if I get the same result; I wonder if it might work since the uid would be used on Ubuntu at least (and I think other standard Linux OSes as well...).

I'm uid 501 on my Macs, and there is no uid 1000, so it seems like that could be the issue.

I get this same error! I think we were both reporting this issue at the same time... #90

Anyhow, I am getting this same error when running in OpenShift (from the Red Hat DevSuite / CDK) on both a Mac and Windows system (using VMs).

If I explicitly set user: root in my docker-compose.yml, then whoami doesn't error out at least:

$ docker exec 3697b whoami
root

But other things seem to break鈥攏one of the frontend resourcess (css, js, etc.) load if the container is run with user: root.

@mrjoshuap - I would imagine it could be the same problem... but I wouldn't rule out the problem being slightly different between the two use cases either.

@geerlingguy I noticed installer/image_build/files/launch_awx.sh and installer/image_build/files/launch_awx_task.sh only updated the /etc/passwd appropriately if the uid is > 10,000:

#!/usr/bin/env bash
if [ `id -u` -ge 10000 ]; then
    echo "awx:x:`id -u`:`id -g`:,,,:/var/lib/awx:/bin/bash" >> /tmp/passwd
    cat /tmp/passwd > /etc/passwd
    rm /tmp/passwd
fi

I'm going to try setting the 10000 to 500, and see how it goes.

Ah yes, now I remember, @matburt had mentioned that script earlier today, and I even had it open in a window behind this one. I'm guessing we need to adapt that script to work with other systems. I think it is set like that to work with OpenShift?

well, I suspect the awx user needs to exist (or at least resolve with getpwuid()) for all systems, and I think the spirit of that check was "if not running as root", but I think we might be better served to ensure an awx entry in /etc/passwd exists.

I confirmed that at least for my case, changing the 10000 to 500 solved my problem in OpenShift, and I suspect it will solve yours as well. I didn't fork or setup a branch, and there might be a better way of doing it.

I'm effectively just checking to ensure the awx passwd entry exists instead of a high uid number:

grep -e '^awx' /etc/passwd || {
    echo "awx:x:`id -u`:`id -g`:,,,:/var/lib/awx:/bin/bash" >> /tmp/passwd
    cat /tmp/passwd > /etc/passwd
    rm /tmp/passwd
}

It's interesting that your openshift behaves a little different for me @mrjoshuap. Doesn't openshift/minishift always select an id over 10k?

I suspect this will be the solution, basically always forcing us to create the awx user, but I'm also curious why the playbook's docker run doesn't force the container to start up as root in the standalone docker install case.

@matburt - It seems that if I set user: root in the docker-compose.yml file, it _does_, in fact, fix this issue at least when running either on Docker for Mac or Docker for CentOS7/Linux...

I'm still working on one other weird issue鈥攊t seems the ui static assets folder gets screwed up somewhere in the container build, so I had to add some custom tasks via an ansible-container role (see: https://github.com/geerlingguy/awx-container/commit/74a74b95f58745d8e773f02a4a86f069576b777c#diff-177371f85f042aac5032ac7e488fe281 and issue https://github.com/geerlingguy/awx-container/issues/6)

@matburt, I'm using the CDK and I suspect that the default SCC value there is Run As User Strategy: RunAsAny, where the default for a normal installation is probably NOT Run As User Strategy: RunAsAny.

I don't have access to a full OpenShift environment to compare and contrast at the moment, so I'll need to verify when I get back in the office next week.

@geerlingguy that's interesting... that symlink shouldn't be required, I would think the collectstatic that we run during the awx_web container startup would take of that? At least it does locally for me.

@mrjoshuap keep me posted!

I'm totally okay with dropping the uid conditional down... it might(?) allow us to drop the always-use-root under standalone docker. I think I did that for a reason and took the shortcut because it was easier than fixing the original problem.

@matburt - yeah, the symlink was a false flag... it seems that it's not necessary after some more testing. User user: root was enough to fix the main issue (see https://github.com/geerlingguy/awx-container/issues/7). The other issue I'm working on has to do with the container link / webserver hostname (awxweb vs. awx_web), and I hope it will be a simple fix to the docker compose file again: https://github.com/geerlingguy/awx-container/issues/8

I can validate that running this under Openshift works fine when NOT using scc anyuid. It does fail for scc anyuid due to the script described above

So... should we close this (since there are known workarounds and it seems like it's not as high priority/necessary as it was when first found) or leave it open for some change to make the uid scripting a little more robust?

I think a really simple PR that lowers the uid by which we perform the passwd file rewrite, and some light testing on the docker containers would totally solve this issue.

I can do it in a few days if no one gets around to it.

implemented in #431

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darkaxl picture darkaxl  路  3Comments

marshmalien picture marshmalien  路  3Comments

augabet picture augabet  路  3Comments

artmakh picture artmakh  路  3Comments

pebbledavec picture pebbledavec  路  3Comments