Podman: Full host disk causes readObjectStart error

Created on 19 Dec 2019  路  9Comments  路  Source: containers/podman

/kind bug

Steps to reproduce the issue:

  1. Go to bed

  2. Have the root partition fill up

  3. Wake up and check /var/log/messages

Describe the results you received:

$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 3.9G     0  3.9G   0% /dev
tmpfs                    3.9G  176K  3.9G   1% /dev/shm
tmpfs                    3.9G  378M  3.5G  10% /run
/dev/mapper/fedora-root   15G   15G   20K 100% /
tmpfs                    3.9G  4.0K  3.9G   1% /tmp
/dev/sdb2                932G  234G  698G  26% /media/exfat
/dev/sda1                976M  222M  688M  25% /boot
tmpfs                    788M  164K  788M   1% /run/user/1000
overlay                   15G   15G   20K 100% /var/lib/containers/storage/overlay/9b7b9772e9a67b8a1294acb02fb1109849056852bd344f2e8792af73d7286f41/merged
shm                       63M  4.0K   63M   1% /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/shm
overlay                   15G   15G   20K 100% /var/lib/containers/storage/overlay/2ecc9119b2a504eef7adbf116f2104aceb3e251ced2f324884cca37059b2fdc5/merged

$ cat /var/log/messages
...
Dec 19 02:06:03 jennycloud podman[456262]: unhealthy
Dec 19 02:06:03 jennycloud podman[456262]: Error: unable to update health check log /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log for 1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7: failed to unmarshal existing healthcheck results in /var/lib/containers/storage/overlay-containers/1b2e6aafb3b3ea801302aa99ba84fff8e21c34c0b623dd985a623db0da0ff7a7/userdata/healthcheck.log: readObjectStart: expect { or n, but found
Dec 19 02:06:03 jennycloud podman[456262]: , error found in #0 byte of ...||..., bigger context ...||...

$ podman ps
9d6c432b35e3  docker.io/plexinc/pms-docker:latest           5 days ago  Up 5 days ago         focused_benz

$ cat /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log

$ ls -al /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log
-rwx------. 1 root root 0 Dec 17 03:41 /var/lib/containers/storage/overlay-containers/9d6c432b35e3b7fffe53c682998fe4a487e8a986d51d56fe5bf6e5c00ee1961e/userdata/healthcheck.log

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman version
Version:            1.7.0-dev
RemoteAPI Version:  1
Go Version:         go1.13.5
OS/Arch:            linux/amd6

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.5
  podman version: 1.7.0-dev
host:
  BuildahVersion: 1.11.6
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.9-0.1.dev.gitc2e2e67.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.9-dev, commit: 6ebb63dda5223b6d086052ef692e9229bfcedb63'
  Distribution:
    distribution: fedora
    version: "32"
  MemFree: 2193379328
  MemTotal: 8254341120
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 7948464128
  SwapTotal: 8405381120
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: jennycloud
  kernel: 5.3.15-300.fc31.x86_64
  os: linux
  rootless: false
  uptime: 129h 46m 45.63s (Approximately 5.38 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
  - quay.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mountopt: nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 2
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

$ rpm -q podman
podman-1.7.0-0.8.dev.git6c7b6d9.fc32.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box with a less unstable kernel from Fedora 31.

Linux jennycloud 5.3.15-300.fc31.x86_64 #1 SMP Thu Dec 5 15:04:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
kinbug stale-issue

All 9 comments

So the healthcheck log filled up an used all of your disk space?

It was a combination of plex metadata and /var/log/messages filling up with that podman message that caused this little server to fill up its disk.

Thank you Dan.

Why do you think this is a podman issue? What could podman have done to prevent it?

I don't think that podman could have prevented it, but I guess a cleaner error instead of the following might be better?

readObjectStart: expect { or n, but found
 , error found in #0 byte of ...||..., bigger context ...||...

That is what the healthcheck is saying?

The error comes from here:

if err := json.Unmarshal(b, &healthCheck); err != nil {
        return healthCheck, errors.Wrapf(err, "failed to unmarshal existing healthcheck results in %s", c.healthCheckLogPath())
    }

I suspect that the "last" update to the log fails because it cannot write the full entry and therefore is no longer valid JSON. Upon read, it then throws this error. I don't think we want to specifically check for a full volume here do we? I could add to the error message that this could be a possible problem?

A friendly reminder that this issue had no activity for 30 days.

@pgporada are you satisfied with the response? Can we close this issue?

I'll take a stab at it and close the issue. Improving this specific error seems like a symptomatic fix to me as we cannot really give any guarantee of a proper/correct behavior if the disk is full. There are so many other commands that would fail in many different ways.

Was this page helpful?
0 / 5 - 0 ratings