Kops: logrotate not good at log rotatation

Created on 24 Jul 2017 · 6Comments · Source: kubernetes/kops

logrotate often gets "stucks" when a disk is full (or near-full).

The debian issue seems to be this one:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734688

A major contributor to full disks seems to be /var/log/fluentd.log, which is not typically rotated.

An indicator of the problem is that you have file.log.1 and file.log.1.gz, with file.log.1.gz having zero size. logrotate then fails to take any further action because it stops when it hits file.log.1 and sees the already-existing file.log.1.gz.

Fixed in logrotate 3.11, available in stretch, but not backported to jessie.

lifecyclrotten

Source

justinsb

🎉1

Most helpful comment

Can we move to stretch? I think it fixes some cves we have as well

chrislovecnm on 24 Jul 2017

👍3

All 6 comments

Can we move to stretch? I think it fixes some cves we have as well

chrislovecnm on 24 Jul 2017

👍3

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 1 Jan 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 31 Jan 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 2 Mar 2018

/reopen

Looks like this is still an issue in kops Version 1.9.0 (git-cccd71e67), and with latest stable CoreOS 1745.6.0. It's probably still an issue in other distros too, as I'm not aware of a distro or logrotate version that has different behavior.

There are probably a couple cases where logrotate will fail, but at least one that I've seen is this:

rotating pattern: /var/log/kube-apiserver.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/kube-apiserver.log
  log needs rotating
rotating log /var/log/kube-apiserver.log, log->rotateCount is 5
dateext suffix '-20180612'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding logs to compress failed
destination /var/log/kube-apiserver.log-20180612 already exists, skipping rotation

Unfortunately, logrotate silently skips rotation with the current configuration used in kops and default configs in many other distros! This is probably due to very unintuitive configuration & expected behavior with maxsize in logrotate.

Problem

The default /usr/share/logrotate/logrotate.conf has dateext option enabled. With this option, all rotated log files will get suffix including the current date with day granularity.

Usually, this works fine if log rotation with minimum granularity of 1 day keeps log file size below max size & all combined logs + rotated files under disk full threshold. However, when logs grow faster than the threshold _within 1 day_, then logrotate refuses to do a subsequent (second, third, etc...) rotation in that day! Eventually if logs continue to grow at this higher rate, you end up with disk full, and a server in very bad state when /var/log is on an important partition.

Q: Why does this happen?
A: logrotate with dateext by default will _skip rotation_ when a file with dateext suffix already exists for that day!

This can be a problem when services generate a lot of logs. You might set maxsize in a logrotate config, and set up hourly rotation expecting that this will prevent logs from growing over maxsize, and rotate multiple times per day on the hour. Unfortunately, logrotate does not behave this way by default when a previous rotated log for that day already exists. You can see by the verbose output above from logrotate -v that it tells you why it's skipping in this case.

trinitronx on 12 Jun 2018

@trinitronx: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Looks like this is still an issue in kops Version 1.9.0 (git-cccd71e67), and with latest stable CoreOS 1745.6.0. It's probably still an issue in other distros too, as I'm not aware of a distro or logrotate version that has different behavior.

There are probably a couple cases where logrotate will fail, but at least one that I've seen is this:
rotating pattern: /var/log/kube-apiserver.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/kube-apiserver.log
 log needs rotating
rotating log /var/log/kube-apiserver.log, log->rotateCount is 5
dateext suffix '-20180612'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding logs to compress failed
destination /var/log/kube-apiserver.log-20180612 already exists, skipping rotation
Unfortunately, logrotate silently skips rotation with the current configuration used in kops and default configs in many other distros! This is probably due to very unintuitive configuration & expected behavior with maxsize in logrotate.

Problem

The default /usr/share/logrotate/logrotate.conf has dateext option enabled. With this option, all rotated log files will get suffix including the current date with day granularity.

Usually, this works fine if log rotation with minimum granularity of 1 day keeps log file size below max size & all combined logs + rotated files under disk full threshold. However, when logs grow faster than the threshold _within 1 day_, then logrotate refuses to do a subsequent (second, third, etc...) rotation in that day! Eventually if logs continue to grow at this higher rate, you end up with disk full, and a server in very bad state when /var/log is on an important partition.

Q: Why does this happen?
A: logrotate with dateext by default will _skip rotation_ when a file with dateext suffix already exists for that day!

This can be a problem when services generate a lot of logs. You might set maxsize in a logrotate config, and set up hourly rotation expecting that this will prevent logs from growing over maxsize, and rotate multiple times per day on the hour. Unfortunately, logrotate does not behave this way by default when a previous rotated log for that day already exists. You can see by the verbose output above from logrotate -v that it tells you why it's skipping in this case.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.