Podman: Full host disk causes readObjectStart error

Created on 19 Dec 2019 · 9Comments · Source: containers/podman

/kind bug

Steps to reproduce the issue:

Go to bed
Have the root partition fill up
Wake up and check /var/log/messages

Describe the results you received:

$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 3.9G     0  3.9G   0% /dev
tmpfs                    3.9G  176K  3.9G   1% /dev/shm
tmpfs                    3.9G  378M  3.5G  10% /run
/dev/mapper/fedora-root   15G   15G   20K 100% /
tmpfs                    3.9G  4.0K  3.9G   1% /tmp
/dev/sdb2                932G  234G  698G  26% /media/exfat
/dev/sda1                976M  222M  688M  25% /boot
tmpfs                    788M  164K  788M   1% /run/user/1000
overlay                   15G   15G   20K 100% /var/lib/containers/storage/overlay/9b7b9772e9a67b8a1294acb02fb1109849056852bd344f2e8792af73d7286f41/merged
shm                       63M  4.0K   63M   1% /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/shm
overlay                   15G   15G   20K 100% /var/lib/containers/storage/overlay/2ecc9119b2a504eef7adbf116f2104aceb3e251ced2f324884cca37059b2fdc5/merged

$ cat /var/log/messages
...
Dec 19 02:06:03 jennycloud podman[456262]: unhealthy
Dec 19 02:06:03 jennycloud podman[456262]: Error: unable to update health check log /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log for 1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7: failed to unmarshal existing healthcheck results in /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log: readObjectStart: expect { or n, but found
Dec 19 02:06:03 jennycloud podman[456262]: , error found in #0 byte of ...||..., bigger context ...||...

$ podman ps
9d6c432b35e3  docker.io/plexinc/pms-docker:latest           5 days ago  Up 5 days ago         focused_benz

$ cat /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log

$ ls -al /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log
-rwx------. 1 root root 0 Dec 17 03:41 /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman version
Version:            1.7.0-dev
RemoteAPI Version:  1
Go Version:         go1.13.5
OS/Arch:            linux/amd6

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.5
  podman version: 1.7.0-dev
host:
  BuildahVersion: 1.11.6
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.9-0.1.dev.gitc2e2e67.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.9-dev, commit: 6ebb63dda5223b6d086052ef692e9229bfcedb63'
  Distribution:
    distribution: fedora
    version: "32"
  MemFree: 2193379328
  MemTotal: 8254341120
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 7948464128
  SwapTotal: 8405381120
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: jennycloud
  kernel: 5.3.15-300.fc31.x86_64
  os: linux
  rootless: false
  uptime: 129h 46m 45.63s (Approximately 5.38 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
  - quay.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mountopt: nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 2
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

$ rpm -q podman
podman-1.7.0-0.8.dev.git6c7b6d9.fc32.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box with a less unstable kernel from Fedora 31.

Linux jennycloud 5.3.15-300.fc31.x86_64 #1 SMP Thu Dec 5 15:04:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

kinbug stale-issue

Source

pgporada

All 9 comments

So the healthcheck log filled up an used all of your disk space?

rhatdan on 19 Dec 2019

It was a combination of plex metadata and /var/log/messages filling up with that podman message that caused this little server to fill up its disk.

Thank you Dan.

pgporada on 19 Dec 2019

Why do you think this is a podman issue? What could podman have done to prevent it?

rhatdan on 19 Dec 2019

I don't think that podman could have prevented it, but I guess a cleaner error instead of the following might be better?

readObjectStart: expect { or n, but found
 , error found in #0 byte of ...||..., bigger context ...||...

pgporada on 19 Dec 2019

That is what the healthcheck is saying?

rhatdan on 19 Dec 2019

The error comes from here:

if err := json.Unmarshal(b, &healthCheck); err != nil {
        return healthCheck, errors.Wrapf(err, "failed to unmarshal existing healthcheck results in %s", c.healthCheckLogPath())
    }

I suspect that the "last" update to the log fails because it cannot write the full entry and therefore is no longer valid JSON. Upon read, it then throws this error. I don't think we want to specifically check for a full volume here do we? I could add to the error message that this could be a possible problem?

baude on 19 Dec 2019

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] on 19 Jan 2020

@pgporada are you satisfied with the response? Can we close this issue?

rhatdan on 19 Jan 2020

❤1

I'll take a stab at it and close the issue. Improving this specific error seems like a symptomatic fix to me as we cannot really give any guarantee of a proper/correct behavior if the disk is full. There are so many other commands that would fail in many different ways.