/kind bug
Description
Steps to reproduce the issue:
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.2
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
#!/bin/bash
OC_NAMESPACE_ENV="weird"
oc_namespace_file="/run/secrets/kubernetes.io/serviceaccount/namespace"
echo ""
if ! [[ -f ${oc_namespace_file} ]]; then
echo " [ WRN ] the namespace file doesn't exist (${oc_namespace_file})," \
"using the OC_NAMESPACE ENV instead" >&2
if [[ -z ${OC_NAMESPACE_ENV} ]]; then
echo -e " [ ERR ] could not use the namespace file (${oc_namespace_file})," \
"nor the environment variable OC_NAMESPACE_ENV\n" >&2
exit
else
oc_namespace=${OC_NAMESPACE_ENV}
fi
else
oc_namespace=$(cat ${oc_namespace_file})
fi
echo " [ INF ] found namespace '${oc_namespace}'"
Note how the error messages are redirected to stderr.
$ > podman build . -t weird:latest
$ > podman run weird:latest
$ > podman run pisa-new:latest
[ INF ] found namespace 'manpi-dev'
[ WRN ] the namespace file doesn't exist (/run/secrets/kubernetes.io/serviceaccount/namespace), using the OC_NAMESPACE ENV instead
$ > podman run pisa-new:latest
[ WRN ] the namespace file doesn't exist (/run/secrets/kubernetes.io/serviceaccount/namespace), using the OC_NAMESPACE ENV instead
[ INF ] found namespace 'manpi-dev'
$ > podman run pisa-new:latest
[ INF ] found namespace 'manpi-dev'
[ WRN ] the namespace file doesn't exist (/run/secrets/kubernetes.io/serviceaccount/namespace), using the OC_NAMESPACE ENV instead
$ > podman run pisa-new:latest
[ INF ] found namespace 'manpi-dev'
[ WRN ] the namespace file doesn't exist (/run/secrets/kubernetes.io/serviceaccount/namespace), using the OC_NAMESPACE ENV instead
$ > podman run pisa-new:latest
[ WRN ] the namespace file doesn't exist (/run/secrets/kubernetes.io/serviceaccount/namespace), using the OC_NAMESPACE ENV instead
[ INF ] found namespace 'manpi-dev'
Note that if I remove the redirection in entrypoint.sh from the "[ERR]"-strings (>&2) the messages always gets printed in the right order (or at least I haven't been able to reproduce the issue).
As you see from the output above, the order is not consistent - which to me indicates that there is something weird going on, or that I'm totally missunderstanding something here.
Describe the results you received:
See point 5 above.
Describe the results you expected:
I expect the messages to be printed in an predictable way and not random, ie. "[INF]"-string should always be printed last.
Additional information you deem important (e.g. issue happens only occasionally):
It seems to me that it happens randomly, some sort of race triggers somewhere.
Output of podman version:
podman version
Version: 2.1.1
API Version: 2.0.0
Go Version: go1.15.2
Built: Wed Oct 7 15:22:27 2020
OS/Arch: linux/amd64
Output of podman info --debug:
$ > podman info --debug
host:
arch: amd64
buildahVersion: 1.16.1
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.21-3.fc33.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab'
cpus: 1
distribution:
distribution: fedora
version: "33"
eventLogger: journald
hostname: fedora
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.8.13-300.fc33.x86_64
linkmode: dynamic
memFree: 612429824
memTotal: 4119355392
ociRuntime:
name: crun
package: crun-0.15-5.fc33.x86_64
path: /usr/bin/crun
version: |-
crun version 0.15
commit: 56ca95e61639510c7dbd39ff512f80f626404969
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
rootless: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.4-4.dev.giteecccdb.fc33.x86_64
version: |-
slirp4netns version 1.1.4+dev
commit: eecccdb96f587b11d7764556ffacfeaffe4b6e11
libslirp: 4.3.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.0
swapFree: 2054680576
swapTotal: 2059399168
uptime: 18h 18m 4.83s (Approximately 0.75 days)
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
store:
configFile: /home/manpi/.config/containers/storage.conf
containerStore:
number: 166
paused: 0
running: 0
stopped: 166
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-1.1.2-1.fc33.x86_64
Version: |-
fusermount3 version: 3.9.3
fuse-overlayfs: version 1.1.0
FUSE library version 3.9.3
using FUSE kernel interface version 7.31
graphRoot: /home/manpi/.local/share/containers/storage
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 410
runRoot: /run/user/1000/containers
volumePath: /home/manpi/.local/share/containers/storage/volumes
version:
APIVersion: 2.0.0
Built: 1602076947
BuiltTime: Wed Oct 7 15:22:27 2020
GitCommit: ""
GoVersion: go1.15.2
OsArch: linux/amd64
Version: 2.1.1
Package info (e.g. output of rpm -q podman or apt list podman):
$ > rpm -q podman
podman-2.1.1-11.fc33.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
I've checked the troubleshooting guide - but didn't find anything.
I have not built my own version of podman though =\
Additional environment details (AWS, VirtualBox, physical, etc.):
I'm running this in VirtualBox, with Fedora 33 as guest OS.
My host OS is macOS Catalina 10.15.7 (19H2).
I'm just using a standard terminal (iterm2), ssh'ing into my virtual Fedora and run podman inside there - I dont think that this should be an issue. Everything else is working "as normal", and as stated above, I get expected behaviour if I remove the redirection to stderr from within the entrypoint.sh.
Thanks for an awesome project! :heart:
A friendly reminder that this issue had no activity for 30 days.
@baude @vrothberg @nalind @mtrmac @mheon @giuseppe
This looks like are randomly getting the STDERR message back after the STDOUT message even though STDERR happened first. This looks like it is an issue with go threading, and I am not sure we can fix it.
Thoughts?
UNIX pipes/sockets don’t come with timing information attached to every byte, so if stdout/stderr is consumed by conmon, or whatever it is, from two separate file descriptors, it’s not possible to reconstruct the original order. If you want that, you have to use a single pipe/socket for both, like a terminal (and then it’s not possible to distinguish output directed to stdout vs. stderr).
Thanks for clearing it up, I guess its not a huge issue, I was just very confused about the output order.
Basically, with a
while stdout_open && stderr_open:
stdout_readable, stdout_open, stderr_readable, stderr_open = poll(stdout_fd, stderr_fd)
if stdout_readable: forward_stdout()
if stderr_readable: forward_stderr()
it’s possible for the producer to write to stdout1, stderr, stdout2, in that order, and _only later_ for the consumer to wake, and see both stderr_readable and stdout_writable, and the stdout buffer to contain (trailing parts of)stdout1+(leading parts of)stdout2, and no way to reconstruct the timing.
Since this can not be fixed easily and is really not a bug. Closing.
Again, thanks for clearing it up and for the awesome work you're doing with podman. Really appreciate your hard work! 🙋🏼♂️