Systemd: RFE: One-time random delay (offset) for timers

Created on 10 Oct 2018  ·  3Comments  ·  Source: systemd/systemd

Is your feature request related to a problem? Please describe

In our product we have a lot of scheduled tasks, many of which involve communication with remote services, that are meant to be run around a certain wall-clock time, but we don't want them to all run at exactly that time, because it can cause load spikes and so on. Even for tasks that don't involve remote communication, it would be convenient to let systemd pick a random offset for timer firings to avoid jobs clustering together. I think AccuracySec= may be meant to address this in part (?), but if it is it's not sufficient in my experience.

Describe the solution you'd like

I would like an option for timer units, similar to RandomizedDelaySec=, but with the delay determined only once when the timer is started, and then used as an offset for subsequent iterations.

For example, given OnCalendar=*:00/3 and (some hypothetical) RandomizedOffsetSec=2m, systemd might pick a delay of 65 seconds, and then the job will fire at 00:04:05, 00:07:05, 00:10:05, and so on, keeping the interval between each at 3 minutes as configured, just offset by 65 seconds from :00 each time.

Another example would be something like OnCalendar=*:00 + RandomizedOffsetSec=59m, which would run a job hourly at a randomly selected but repeatable number of minutes past the hour.

Describe alternatives you've considered

The obvious alternative i considered was RandomizedDelaySec=, and my initial reading of the documentation actually suggested that it did what i want — but it doesn't. RandomizedDelaySec= is more-or-less OK for jobs that run once or twice a day and only need a comparatively short delay, but it is inadequate for jobs that run repeatedly throughout the day and/or have comparatively long random delays like in my example above. Using RandomizedDelaySec= in these situations causes the interval between timer firings to vary wildly, sometimes dipping well below the configured wall-clock interval.

Another alternative, if all you're worried about is preventing machines from hitting your servers all at once, is to use the monotonic timer options, like the combination of OnBootSec= and OnUnitActiveSec= suggested in the manual. But this obviously isn't suitable if you want your jobs to occur around some wall-clock time. And even if that's not a concern, in order to ensure a consistent interval between iterations whilst preventing local jobs from clustering together, you'd have to micro-manage OnBootSec= (which acts as the offset in this scenario) for each timer.

The solution we've used with cron is to have each job run through a script that ensures a random-ish but repeatable delay... but it's not great.

Please let me know if i've overlooked something else that could do what i want.

RFE 🎁 pid1

Most helpful comment

Thanks @okdana for the submission, I've submitted a pull request for this feature. The feature can be activated by adding 'RandomizeFirstRunOnly=true' in the .timer file under the [Timer] section.

/offtopic/
I find this feature very important for automation of systems where a random delay is needed to distribute the load in a network during boot time, however constant randomization prevents timely warning of an down/error state in the system.
If a system doesn't detect a run within the specified interval it can easily indicate failure, but if that interval can vary with each run then the interval for failed run detection will have to accommodate uncertainty of the RandomDelaySec value.

All 3 comments

@coretemp Please refrain from personal attacks in your comments on this project.

@coretemp as I said in https://github.com/systemd/systemd/issues/6432#issuecomment-433694333, comments like this aren't acceptable here. If you don't like the RFE, could you be more specific about what it is that you don't like?

Thanks @okdana for the submission, I've submitted a pull request for this feature. The feature can be activated by adding 'RandomizeFirstRunOnly=true' in the .timer file under the [Timer] section.

/offtopic/
I find this feature very important for automation of systems where a random delay is needed to distribute the load in a network during boot time, however constant randomization prevents timely warning of an down/error state in the system.
If a system doesn't detect a run within the specified interval it can easily indicate failure, but if that interval can vary with each run then the interval for failed run detection will have to accommodate uncertainty of the RandomDelaySec value.

Was this page helpful?
0 / 5 - 0 ratings