/kind bug
Steps to reproduce the issue:
Go to bed
Have the root partition fill up
Wake up and check /var/log/messages
Describe the results you received:
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 176K 3.9G 1% /dev/shm
tmpfs 3.9G 378M 3.5G 10% /run
/dev/mapper/fedora-root 15G 15G 20K 100% /
tmpfs 3.9G 4.0K 3.9G 1% /tmp
/dev/sdb2 932G 234G 698G 26% /media/exfat
/dev/sda1 976M 222M 688M 25% /boot
tmpfs 788M 164K 788M 1% /run/user/1000
overlay 15G 15G 20K 100% /var/lib/containers/storage/overlay/9b7b9772e9a67b8a1294acb02fb1109849056852bd344f2e8792af73d7286f41/merged
shm 63M 4.0K 63M 1% /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/shm
overlay 15G 15G 20K 100% /var/lib/containers/storage/overlay/2ecc9119b2a504eef7adbf116f2104aceb3e251ced2f324884cca37059b2fdc5/merged
$ cat /var/log/messages
...
Dec 19 02:06:03 jennycloud podman[456262]: unhealthy
Dec 19 02:06:03 jennycloud podman[456262]: Error: unable to update health check log /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log for 1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7: failed to unmarshal existing healthcheck results in /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log: readObjectStart: expect { or n, but found
Dec 19 02:06:03 jennycloud podman[456262]: , error found in #0 byte of ...||..., bigger context ...||...
$ podman ps
9d6c432b35e3 docker.io/plexinc/pms-docker:latest 5 days ago Up 5 days ago focused_benz
$ cat /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log
$ ls -al /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log
-rwx------. 1 root root 0 Dec 17 03:41 /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version:
$ podman version
Version: 1.7.0-dev
RemoteAPI Version: 1
Go Version: go1.13.5
OS/Arch: linux/amd6
Output of podman info --debug:
$ podman info --debug
debug:
compiler: gc
git commit: ""
go version: go1.13.5
podman version: 1.7.0-dev
host:
BuildahVersion: 1.11.6
CgroupVersion: v2
Conmon:
package: conmon-2.0.9-0.1.dev.gitc2e2e67.fc32.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.9-dev, commit: 6ebb63dda5223b6d086052ef692e9229bfcedb63'
Distribution:
distribution: fedora
version: "32"
MemFree: 2193379328
MemTotal: 8254341120
OCIRuntime:
name: crun
package: crun-0.10.6-1.fc32.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.6
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 7948464128
SwapTotal: 8405381120
arch: amd64
cpus: 4
eventlogger: journald
hostname: jennycloud
kernel: 5.3.15-300.fc31.x86_64
os: linux
rootless: false
uptime: 129h 46m 45.63s (Approximately 5.38 days)
registries:
blocked: null
insecure: null
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
- quay.io
store:
ConfigFile: /etc/containers/storage.conf
ContainerStore:
number: 3
GraphDriverName: overlay
GraphOptions:
overlay.mountopt: nodev,metacopy=on
GraphRoot: /var/lib/containers/storage
GraphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
ImageStore:
number: 2
RunRoot: /var/run/containers/storage
VolumePath: /var/lib/containers/storage/volumes
Package info (e.g. output of rpm -q podman or apt list podman):
$ rpm -q podman
podman-1.7.0-0.8.dev.git6c7b6d9.fc32.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box with a less unstable kernel from Fedora 31.
Linux jennycloud 5.3.15-300.fc31.x86_64 #1 SMP Thu Dec 5 15:04:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
So the healthcheck log filled up an used all of your disk space?
It was a combination of plex metadata and /var/log/messages filling up with that podman message that caused this little server to fill up its disk.
Thank you Dan.
Why do you think this is a podman issue? What could podman have done to prevent it?
I don't think that podman could have prevented it, but I guess a cleaner error instead of the following might be better?
readObjectStart: expect { or n, but found
, error found in #0 byte of ...||..., bigger context ...||...
That is what the healthcheck is saying?
The error comes from here:
if err := json.Unmarshal(b, &healthCheck); err != nil {
return healthCheck, errors.Wrapf(err, "failed to unmarshal existing healthcheck results in %s", c.healthCheckLogPath())
}
I suspect that the "last" update to the log fails because it cannot write the full entry and therefore is no longer valid JSON. Upon read, it then throws this error. I don't think we want to specifically check for a full volume here do we? I could add to the error message that this could be a possible problem?
A friendly reminder that this issue had no activity for 30 days.
@pgporada are you satisfied with the response? Can we close this issue?
I'll take a stab at it and close the issue. Improving this specific error seems like a symptomatic fix to me as we cannot really give any guarantee of a proper/correct behavior if the disk is full. There are so many other commands that would fail in many different ways.