Origin: forward-journal[xxx]: level=error msg="Error reading loginuid: open /proc/xxx/loginuid: no...directory"

Created on 17 Jun 2016 · 51Comments · Source: openshift/origin

Version

oc v1.2.0-1-g7386b49
kubernetes v1.2.0-36-g4a3f9c5

Steps To Reproduce

tail -n 100 /var/log/messages

Current Result

Jun 17 08:25:45 os_master_01 journal: time="2016-06-17T08:25:45.11x737815+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:46 os_master_01 journal: time="2016-06-17T08:25:46.115x25135+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:46 os_master_01 journal: time="2016-06-17T08:25:46.582x95897+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:47 os_master_01 journal: time="2016-06-17T08:25:47.115x30728+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:48 os_master_01 journal: time="2016-06-17T08:25:48.116x68137+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:48 os_master_01 journal: time="2016-06-17T08:25:48.582x07583+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"
Jun 17 08:25:48 os_master_01 journal: time="2016-06-17T08:25:48.58257x671+02:00" level=error msg="Error reading loginuid: open /proc/32654/loginuid: no such file or directory"

Additional Information

The logs are showed when executingsudo service docker status

Can someone explain this behaviour? The logs dissapear when I perform sudo service origin-node stop

kinquestion prioritP2

Source

ghost

All 51 comments

@runcom does this ring a bell?

ncdc on 20 Jun 2016

@lvthillo what OS? What docker version? If you have docker running and origin-node is not running, do you see additional error messages like this when you run commands like docker ps?

ncdc on 20 Jun 2016

@ncdc is that with Docker 1.10.3?

runcom on 20 Jun 2016

@runcom does this ring a bell?

Btw, yes, that's coming from a patch we're carrying (the audit one) - @nalind wrote it, not sure what's going on honestly (except the pid isn't there anymore and we can't read loginuid)

runcom on 20 Jun 2016

@ncdc @runcom

The logs are only on servers which are running my nodes

Centos 7.2

Client:
 Version:         1.9.1
 API version:     1.21
 Package version: docker-common-1.9.1-40.el7.centos.x86_64
 Go version:      go1.4.2
 Git commit:      ab77bde/1.9.1
 Built:
 OS/Arch:         linux/amd64

Server:
 Version:         1.9.1
 API version:     1.21
 Package version: docker-common-1.9.1-40.el7.centos.x86_64
 Go version:      go1.4.2
 Git commit:      ab77bde/1.9.1
 Built:
 OS/Arch:         linux/amd64

sudo service origin-node status gave me:

state running:
9739 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid o...it status 1

sudo service docker status:
time="2016-06-21T08:48:24.971651035+02:00" level=error msg="Error reading loginuid: open /proc/xxxx/loginuid: no such fil... directory"

Now I stop my node (docker keeps running):
docker status: level=error msg="Error reading loginuid: open /proc/9739/loginuid: no such fil... directory"

restart docker:

Jun 21 08:49:32  forward-journal[13338]: time="2016-06-21T08:49:32.791468938+02:00" level=info msg="Docker daemon" commit="ab77bde/1.9.1" execdriver=native-0.2 g...rsion=1.9.1
Jun 21 08:49:32  forward-journal[13338]: time="2016-06-21T08:49:32.801115851+02:00" level=info msg="API listen on /var/run/docker.sock"
Jun 21 08:49:32  systemd[1]: Started Docker Application Container Engine.

When I start my openshift-node again docker shows me again:
time="2016-06-21T08:51:21.581770368+02:00" level=error msg="Error reading loginuid: open /proc/13504/loginuid: no such f... directory"

the node is showing me:

I0621 08:50:10.993842   13504 manager.go:261] Recovery completed
Jun 21 08:50:11  origin-node[13504]: I0621 08:50:11.031174   13504 kubelet.go:2770] Recording NodeHasSufficientDisk event message for node 
Jun 21 08:50:11 [13504]: I0621 08:50:11.031203   13504 kubelet.go:2770] Recording NodeReady event message for node 
Jun 21 08:50:15  origin-node[13504]: I0621 08:50:15.654958   13504 kubelet.go:2404] SyncLoop (ADD, "api"): ""

ghost on 21 Jun 2016

Hello guys, the same is happening here.
A lot of spamming messages from origin-node.service, filling up the /var/log/messages.

GCE
OS: CentOS Linux release 7.2.1511 (Core)

OpenShift Origin
oc v1.2.0
kubernetes v1.2.0-36-g4a3f9c5
Same Client & Server version of docker daemon described on this thread.

If you need any more info or help on this matter.
I will be glad to provide.

Best regards
Gabriel Scheffer

gabrielscheffer on 21 Jun 2016

@lvthillo @gabrielscheffer please stop origin-node, then run some docker commands (docker images, docker ps, etc) - do you continue to see the loginuid error messages showing up for the docker daemon?

ncdc on 21 Jun 2016

@ncdc (in my case) the logs aren't showed anymore. It's only showed when the origin node is also running. After restarting the node, the logs are back

ghost on 21 Jun 2016

Same as here, the logs only show up when origin-node is running.
And I noticed the following logs on the service status:

origin-node[49497]: E0621 13:40:46.056433 49497 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1
origin-node[49497]: E0621 13:45:46.063560 49497 container_manager_linux.go:267] failed to detect process id for "docker" - failed to find pid of "docker": exit status 1

I don't know if in some way is related to this issue.

gabrielscheffer on 21 Jun 2016

@gabrielscheffer did you maby try it on centos 7.1 or 7.0? Don't know if 7.2 is recommended? (we are also using it)

ghost on 21 Jun 2016

@lvthillo Hmm, didn't know that. I just got the default Centos 7 image from my cloud provider. I will give a try on an older version.

gabrielscheffer on 21 Jun 2016

So, I created a small cluster from scratch.
But I've some bad news, unfortunately even on CentOS Linux release 7.1.1503 (Core) I got those spam logs.

gabrielscheffer on 21 Jun 2016

I reinstalled it multiple times but every time the same issue on each node.

ghost on 21 Jun 2016

@gabrielscheffer which cloud provider are you using. We want to rule out that it is our provider or something.

ghost on 22 Jun 2016

@lvthillo Google Compute Engine.
But a friend of mine has the same problem on AWS.

gabrielscheffer on 22 Jun 2016

@gabrielscheffer Okay good to know. We are using OVH and same problem too so it's an openshift issue.

ghost on 22 Jun 2016

Are you all running OpenShift containerized?

ncdc on 22 Jun 2016

@ncdc No using the ansible installation. Reinstalled it every day since friday and always the same issue.

ghost on 22 Jun 2016

@ncdc Nope mate, running the default install from Ansible master branch.

gabrielscheffer on 22 Jun 2016

I'm spinning up a Centos VM to check things out.

ncdc on 22 Jun 2016

@ncdc Where you able to reproduce the issue?

ghost on 22 Jun 2016

Not yet, at least not with the latest origin master code. I didn't try an ansible install - I was just building and running from source. I'm checking out v1.2.0 and will try that tag to see if the behavior is any different.

ncdc on 22 Jun 2016

@ncdc I've done multiple installs and my oc version seems sometimes a bit different but the issue is always coming up.

ghost on 22 Jun 2016

Are you running pods on the node, or is the node empty of containers?

ncdc on 22 Jun 2016

@ncdc I'm able to run pods (router&registry) on my node, but the errors are there after the playbook is finished. So when the node is empty of pods/containers

ghost on 22 Jun 2016

Trying an ansible install now. Will report back.

ncdc on 22 Jun 2016

I get this, which I assume has a similar root cause: level=error msg="Failed to get pwuid struct: user: unknown userid 4294967295"

ncdc on 22 Jun 2016

Actually, that's https://bugzilla.redhat.com/show_bug.cgi?id=1335635

ncdc on 22 Jun 2016

@lvthillo @gabrielscheffer are you able to look for /proc/<pid>/loginuid - does it exist?

ncdc on 22 Jun 2016

@ncdc for me it does not exist. (/proc/pid exisits, but not the loginuid) What steps are you following to install openshift origin without ansible?

ghost on 22 Jun 2016

@lvthillo I'm using an ansible-based install now. And I'm hitting the bz I linked above. But I can't reproduce your error locally.

ncdc on 22 Jun 2016

@ncdc I want to install openshift v1.2.0 without using ansible so that the issue isn't coming up (you didn't saw the issue when you did not use the ansible install)

ghost on 22 Jun 2016

I generally do something like this (assuming you have a working go environment):

Grab a clone of origin from git and put it in the right place in your GOPATH
cd origin
git checkout v1.2.0
make
sudo _output/local/bin/linux/amd64/openshift start --write-config=openshift.local.config
sudo _output/local/bin/linux/amd64/openshift start --master-config openshift.local.config/master/master-config.yaml --node-config openshift.local.config/node-your_node_here/node-config.yaml

ncdc on 22 Jun 2016

@ncdc so, /proc//loginuid does not exist here too.
My ansible installation come deployment_type=origin from master of openshift-ansible on github.

Steps to reproduce:

Provisioning a machine Centos 7.2 or 7.1
Enable Epel repo
Install pre-req. packages:
- ansible
- bash-completion
- bind-utils
- bridge-utils
- docker
- git
- iptables-services
- net-tools
- pyOpenSSL
- python-libcloud
- python-passlib
Configure docker-storage-setup to another disk using default xfs.
git clone https://github.com/openshift/openshift-ansible.git
Set inventory hosts file for ansible.
Run ansible-playbook -i openshift-ansible/playbooks/byo/config.yml

Maybe that is the problem? Should I open a issue on the ansible team?

gabrielscheffer on 22 Jun 2016

Can you paste the current pid with the loginuid error along with the output of ps auxwf?

ncdc on 22 Jun 2016

Sure, but after last reinstall I'm just getting these errors:
------/var/log/messages------

Jun 22 14:39:45 master-node-1 journal: time="2016-06-22T14:39:45.829214266-03:00" level=error msg="Failed to get pwuid struct: user: unknown userid 4294967295"
Jun 22 14:39:46 master-node-1 journal: time="2016-06-22T14:39:46.167511935-03:00" level=error msg="Failed to get pwuid struct: user: unknown userid 4294967295"

-------------ps auxwf------------------------

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.1 125964  6680 ?        Ss   12:08   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root        234  0.1  0.3  53628 14492 ?        Ss   12:08   0:10 /usr/lib/systemd/systemd-journald
root        245  0.0  0.1 202876  4080 ?        Ssl  12:08   0:00 /usr/sbin/lvmetad -f
root        259  0.0  0.1  45884  4636 ?        Ss   12:08   0:00 /usr/lib/systemd/systemd-udevd
root        334  0.0  0.0  51188  1612 ?        S<sl 12:08   0:00 /sbin/auditd -n
dbus        374  0.0  0.0  35160  2132 ?        Ssl  12:08   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root        376  0.0  0.0 203368  1240 ?        Ssl  12:08   0:00 /usr/sbin/gssproxy -D
avahi       388  0.0  0.0  28120  1788 ?        Ss   12:08   0:00 avahi-daemon: running [master-node-1.local]
avahi       400  0.0  0.0  27988   236 ?        S    12:08   0:00  \_ avahi-daemon: chroot helper
root        393  0.0  0.3 251760 11788 ?        Ssl  12:08   0:03 /usr/sbin/rsyslogd -n
root        395  0.0  0.2 534036  9676 ?        Ssl  12:08   0:00 /usr/sbin/NetworkManager --no-daemon
root        824  0.0  0.4 110524 15864 ?        S    12:08   0:00  \_ /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkMan
root        397  0.0  0.0  26400  1724 ?        Ss   12:08   0:00 /usr/lib/systemd/systemd-logind
root        405  0.0  0.0   4340   548 ?        Ss   12:08   0:00 /usr/sbin/acpid
root        568  0.0  0.0  45488   716 ?        S<s  12:08   0:00 ovsdb-server: monitoring pid 569 (healthy)
root        569  0.0  0.0  45888  2384 ?        S<   12:08   0:01  \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openv
root        579  0.0  0.0  49152   804 ?        S<s  12:08   0:00 ovs-vswitchd: monitoring pid 580 (healthy)
root        580  0.0  0.9 270996 35384 ?        S<Ll 12:08   0:09  \_ ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdi
root        614  0.0  0.0  53064  2652 ?        Ss   12:08   0:00 /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplicant.log -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /va
polkitd     615  0.0  0.3 527460 13044 ?        Ssl  12:08   0:00 /usr/lib/polkit-1/polkitd --no-debug
root        617  0.0  0.5 133564 20948 ?        S<Lsl 12:08   0:01 /usr/sbin/dmeventd -f
root        885  0.0  0.2 208400 10640 ?        Ss   12:08   0:00 /usr/bin/python /usr/share/google/google_daemon/manage_clock_sync.py
root        888  0.0  0.2 213384 11272 ?        Ss   12:08   0:00 /usr/bin/python /usr/share/google/google_daemon/manage_accounts.py
root      13817  0.0  0.2 215452  8920 ?        S    14:43   0:00  \_ /usr/bin/python /usr/share/google/google_daemon/manage_accounts.py
root        895  2.6  8.0 885352 303944 ?       Ssl  12:08   4:05 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2
root        899  0.0  0.4 553164 16508 ?        Ssl  12:08   0:01 /usr/bin/python -Es /usr/sbin/tuned -l -P
root        900  0.0  0.2 206876 11100 ?        Ss   12:08   0:00 /usr/bin/python /usr/share/google/google_daemon/manage_addresses.py
root        969  0.0  0.0 126332  1700 ?        Ss   12:08   0:00 /usr/sbin/crond -n
ntp         975  0.0  0.0  29408  2064 ?        Ss   12:08   0:00 /usr/sbin/ntpd -u ntp:ntp -g
root        977  0.0  0.0  25964   964 ?        Ss   12:08   0:00 /usr/sbin/atd -f
root       1089  0.0  0.0 110036   840 tty1     Ss+  12:08   0:00 /sbin/agetty --noclear tty1 linux
root       1091  0.0  0.0 110036   860 ttyS0    Ss+  12:08   0:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
root       1125  0.0  0.7 682248 27580 ?        Ssl  12:08   0:06 /opt/stackdriver/collectd/sbin/stackdriver-collectd -C /opt/stackdriver/collectd/etc/collectd.conf -P /var/run/
root       1179  0.0  0.0  82560  3572 ?        Ss   12:08   0:00 /usr/sbin/sshd -D
root      10149  0.0  0.1 140788  5076 ?        Ss   13:59   0:00  \_ sshd: gabrielscheffer [priv]
centos  10153  0.0  0.0 140788  2368 ?        S    13:59   0:00      \_ sshd: gabrielscheffer@pts/0
centos  10154  0.0  0.3 127184 13960 pts/0    Ss   13:59   0:00          \_ -bash
centos  13878  0.0  0.0 151168  1960 pts/0    R+   14:44   0:00              \_ ps auxwf
root       1262  0.0  0.2 203796  7580 ?        S    12:08   0:00 python /opt/stackdriver/extractor/bin/extractd.py -c /opt/stackdriver/extractor/etc/extractor.conf -P /var/run/
root       1319  0.0  0.0  91140  2160 ?        Ss   12:08   0:00 /usr/libexec/postfix/master -w
postfix    1321  0.0  0.1  91420  3932 ?        S    12:08   0:00  \_ qmgr -l -t unix -u
postfix    9229  0.0  0.1  91244  3916 ?        S    13:46   0:00  \_ pickup -l -t unix -u
root       1325  0.9  1.6 728192 60956 ?        Ssl  12:08   1:25 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
root       1539  0.0  0.0 115244  1432 ?        Ss   12:08   0:00 /bin/sh -c /usr/bin/docker-current daemon $OPTIONS            $DOCKER_STORAGE_OPTIONS            $DOCKER_NETWOR
root       1541  0.5  1.0 822044 38596 ?        Sl   12:08   0:56  \_ /usr/bin/docker-current daemon --selinux-enabled --insecure-registry=172.30.0.0/16 --log-opt max-size=10M -
root       1799  0.0  0.0   2972   780 ?        Ssl  12:08   0:00  |   \_ /pod
root       1828  0.0  0.0   2972   744 ?        Ssl  12:08   0:00  |   \_ /pod
root       1921  0.0  0.7 360752 27712 ?        Ssl  12:08   0:01  |   \_ /usr/bin/openshift-router
root      13544  0.0  0.1  52864  6700 ?        Ss   14:39   0:00  |   |   \_ /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -sf 1
1000000+   1934  0.0  0.3  35268 13032 ?        Ssl  12:08   0:00  |   \_ /dockerregistry /config.yml
root       1542  0.0  0.0 101728  2016 ?        Sl   12:08   0:02  \_ /usr/bin/forward-journald -tag docker

Is that it?

gabrielscheffer on 22 Jun 2016

Failed to get pwuid struct: user: unknown userid 4294967295 is the bz listed above

ncdc on 22 Jun 2016

👍1

@ncdc
level=error msg="Error reading loginuid: open /proc/11976/loginuid: no such file or directory"

root 11976 1.7 0.1 833564 46292 ? Ssl 19:51 0:01 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2

ghost on 22 Jun 2016

@lvthillo does /proc/11976 exist?

ncdc on 22 Jun 2016

@ncdc
yes but no loginuid:

$ ls
auxv    cmdline  coredump_filter  cwd      exe  fdinfo   io      limits  mem        mounts      net  numa_maps  oom_score      personality  root       stat   status   task
cgroup  comm     cpuset           environ  fd   gid_map  ipaddr  maps    mountinfo  mountstats  ns   oom_adj    oom_score_adj  projid_map   setgroups  statm  syscall  uid_map

ghost on 22 Jun 2016

@gabrielscheffer How did you perform your new install? (I performed a new git clone but same issue)

ghost on 22 Jun 2016

@lvthillo So, like described up

gabrielscheffer on 22 Jun 2016

@lvthillo can you grep loginuid /etc/pam.d/*?

ncdc on 22 Jun 2016

This is not an OpenShift issue. The kernel needs to be compiled with audit support enabled, and you may need to have pam_loginuid configured. Once you do have /proc/pid/loginuid files in place for the OpenShift processes, they will be MAXINT, and you'll end up with the Failed to get pwuid struct: user: unknown userid 4294967295 "error" until a newer version of Docker is available that doesn't have this log spam.

ncdc on 22 Jun 2016

@ncdc this has nothing to do with pam.

This is NOT a CentOS kernel. This is NOT a RHEL kernel. The kernel was built without CONFIG_AUDIT and/or CONFIG_AUDIT_SYSCALL.

This is a bug in docker. It should support such kernels.

Please file a BZ against 'docker' at bugzilla.redhat.com. Tell them that they need to support kernel's without CONFIG_AUDIT.

eparis on 22 Jun 2016

it's not against upstream docker though, the issue should be filed under projectatomic/docker since it's our patch causing this

runcom on 22 Jun 2016

@lvthillo ^^

runcom on 22 Jun 2016

@runcom okay, was an older issue which I wanted to close .

ghost on 22 Jun 2016

Thank you all for the support =)

gabrielscheffer on 22 Jun 2016

@runcom Any idea when this will be fixed?
Is it possible to start an OpenShift environment with the 'broken' docker. And after the fix remove docker and reinstall it (so will it remain 1.9.1) + are there also functional problems with the docker version or only the 'error-bug'. Sorry for this big amount of questions but we're a bit stuck in here.

ghost on 23 Jun 2016

@lvthillo Hello mate, based on this patch
I released my own docker-1.9.1-40.
For me it's fixed, you can dig up my changes at my repo https://packagecloud.io/gabrielscheffer/centos7

Steps to install fixed docker:

1. curl -s https://packagecloud.io/install/repositories/gabrielscheffer/centos7/script.rpm.sh | sudo bash
2. sudo yum install docker docker-common docker-forward-journald docker-logrotate docker-selinux
3. sudo systemctl restart docker

Cheers

gabrielscheffer on 23 Jun 2016

Was this page helpful?

0 / 5 - 0 ratings