Problem: Many BCL components use DateTime.UtcNow to compute timeouts and time deltas. When you run a "Find Usages" on DateTime.UtcNow in .NET Framework assemblies a lot of usage sites come to light. For example, in ADO.NET, transactions, caches, WebRequest, remoting, WCF, SignalR, security.
All of these usages are bugs because the current system time can change significantly forwards and backwards at any time. This means that in case the system clock is changed many .NET components will spuriously fail.
Common symptoms would be timeouts firing to early or too late (possibly never). Perf counters also might show too high or negative values.
Sample scenario:
Then, the cache item will essentially stay around forever if eviction is based on DateTime.UtcNow. I believe this bug exists in the BCL.
User code also has the same problems.
Solution: Add some kind of monotonic clock to the BCL. The main property of such a clock would be that its time advances linearly. It should never jump when the system clock changes. It should not exhibit split second jumps. It should behave reasonably in case the system sleeps or hibernates or in case the VM is paused.
This monotonic clock class should be fast and convenient to use so that it is a no-brainer to switch all DateTime.UtcNow usages over to the new model.
Maybe we can add Environment.TickCount64 as well. Environment.TickCount64 alone would not be good enough because it is awkward to use.
I'm not sure if Stopwatch would cover all these requirements. It is a reference type which might be too heavy for hot code path in the BCL.
Maybe we can add a new value type that is made exactly for this purpose. A sketch:
struct MonotonicTime {
long Ticks;
MonotonicTime operator + (MonotonicTime, TimeSpan);
TimeSpan operator - (MonotonicTime, MonotonicTime);
bool IsNegative;
static MonotonicTime Current;
...
}
To summarize, I request:
Environment.TickCount64.It is a reference type which might be too heavy for hot code path in the BCL.
Just responding to this one point, Stopwatch provides the static GetTimestamp() method, so if the timing support provided by Stopwatch is what you want, you can get at the same mechanism without instantiating a Stopwatch instance.
static MonotonicTime ForDateTime(DateTime utcNow);
DateTime AsDateTime();
How would these work? If these function existed, it'd mean that MonotonicTime is equivalent to DateTime and UtcNow, which clearly contradicts the requirements of a monotonic clock.
I can imagine this implementation (minus synchronization):
struct MonotonicTime {
static long maxTicks;
static MonotonicTime ForDateTime(DateTime utcNow){
if(utcNow.Ticks > maxTicks) {
maxTicks = utcNow.Ticks;
}
return new MonotonicTime(maxTicks);
}
}
@CodesInChaos true, I have edited those away.
@KrzysztofCwalina that's not good enough because it can cause monotonic time to stand still for arbitrary periods of wall-clock time. Monotonic time must advance with the wall clock. That's the point of timeouts. Make users and other systems wait x seconds on the clock.
I think MonotonicTime should be based on Environment.TickCount64 or Stopwatch.GetTimestamp or whatever it is called. The latter is nice because it has high precision. Could be nice for logging when you can tell action apart that happened within microseconds. I always found the limited 15ms precision of DateTime.UtcNow and Environment.TickCount quite ugly.
@GSPP, I thought monotonic means never decreasing. Also, I am not sure what the semantics/implementation could be if you wanted to always increasing. Given the system clock is lower resolution than processor clock, it's not clear how it would be implemented.
@KrzysztofCwalina The way I understand it, monotonic _does_ mean never decreasing, but that's not the only requirement. It's okay for the value to not increase for, say, 366 ns (that's what the resolution of Stopwatch seems to be for me). It's definitely not okay for it to not increase for an arbitrary amount of time.
Also, your implementation is not really monotonic (at least not with this interface). Consider something like:
``` c#
var dt1 = DateTime.UtcNow;
Thread.Sleep(aWhile);
var dt2 = DateTime.UtcNow;
Thread.Sleep(someMore);
var dt3 = DateTime.UtcNow;
var monotonic2 = MonotonicTime.ForDateTime(dt2);
var monotonic3 = MonotonicTime.ForDateTime(dt3);
var monotonic1 = MonotonicTime.ForDateTime(dt1);
```
Here, monotonic1 > monotonic2, which is wrong.
@svick correct, it's supposed to be like TickCount. If it's not suitable for timeouts or relative time measurement it's no good. The more resolution and the cheaper to compute the better.
@GSPP, @svick, Can you then propose an implementation? I am not sure how to get the semantics you want.
@KrzysztofCwalina I think Stopwatch timestamps should work. According to search engines they even continue advancing when the system is hibernated.
@GSPP, ah, ok. I got confused. I thought you did not want to use Stopwatch/QueryPerformanceCounter for so reason.
One note is that Stopwatch actually falls back on DateTime.UtcNow.Ticks if performance counters are not available, so even then it's not a robust solution.
@scalablecory good point but maybe there are no supported platforms anymore where that fallback is necessary. If this is rare enough one could even make the monotonic time struct throw in case good stopwatch support is not there.
The kernel team should know some stats about high frequency counter support.
Just stumbled across this issue and I'll add:
Stopwatch is indeed the most appropriate timer for profiling code (deltas). It uses QueryPerformanceCounter, which you can read about here.Stopwatch.GetTimestamp() is indeed the way to go when profiling _large quantities_ of code. It prevents the additional heap allocations and GC you'd have with lots of Stopwatch instances. Be sure to take the Frequency property into account when computing the result.Stopwatch or DateTime.UtcNow are the most appropriate sources for this (nor are binding them together).GetSystemTimePreciseAsFileTime. I can see a .Net wrapper around that being useful for certain scenarios.clock_gettime).GetSystemTimePreciseAsFileTime summarize the usage scenarios quite well:Note This function is best suited for high-resolution time-of-day measurements, or time stamps that are synchronized to UTC. For high-resolution interval measurements, use
QueryPerformanceCounter...
And as mentioned previously, QPC is already used by Stopwatch.
WRT the original scenarios mentioned, many are indeed just fine with UtcNow. For example, expiring a cache in an hour is not an operation that requires high precision. In fact, trying to provide such high precision can be extremely difficult when you consider distributed environments. Eventually you reach the limitations of NTP and w32tm, and would have to have specialized hardware to really obtain such accurate time coordination across machines.
GetSystemTimePreciseAsFileTime seems like a highly useful API. Can we switch DateTime.UtcNow over to it? Needs a benchmark. Otherwise this seems like a pure win. I see people failing all the time because of the default 60 Hz precision of DateTime.UtcNow.
But GetSystemTimePreciseAsFileTime is not monotonic because the clock can be changed.
The idea of a monotonic clock does not include providing "legible" time as DateTime.UtcNow provides it. This notion is _incompatible_ with having a monotonic clock. A monotonic clock can only meaningfully provide time deltas. Any absolute time stamp would be a meaningless number with a system dependent unit exactly like the Stopwatch timestamps.
This ticket is _not_ about high precision so much. It's about a clock source that cannot jump arbitrarily like the system clock.
Re: GetSystemTimePreciseAsFileTime, given that it has an identical API to GetSystemTimeAsFileTime, I suspect it has some performance implications so we'd be unnecessarily slowing a lot of apps down for precision they don't need if we used it for DateTime.UtcNow.
.NET does already expose all the monotonic clocks most would ever need, between Stopwatch (which should only be used if IsHighResolution is true) and Environment.TickCount.
The API is new to Windows 8. It did not exist back then. I have created a ticket to split off that discussion. It has nothing to do with monotonic time.
Environment.TickCount is unusable because the code will break after 30 days of running. Stopwatch appears to be a suitable source of monotonic time, as discussed above. My API request stands. It's a convenience API.
A stricly increasing (e.g. non-decreasing) clock I know of (and that is, 'as a bonus'?, high precision) is QueryPerformanceCounter.
Is the performance counter monotonic (non-decreasing)?
Yes
It can 'leap' forward* (in some circumstances, AFAIK, correct me if I'm wrong) but never back.
I'm not sure I would rely on the Stopwatch since it _currently_ uses QPC (and/or GetTickCount / GetTickCount64) , but that's an implementation detail and may change at any time I guess?
* There used to be a KB article (KB274323) about this (see archive.org) but it, currently, redirects to Acquiring high-resolution time stamps
@GSPP proper long-term use of Environment.TickCount would be to query it at least every 30 days and maintain a total.
Of course a TickCount64 would be better, and could work as far back as XP SP2 (i know the docs for GetTickCount64 say Vista, but GetTickCount actually returns a 64-bit value in XP SP2).
I think ideally Stopwatch.GetTimestamp would just be updated to fall back on GetTickCount64, so it would be the single robust monotonic solution.
Right, Stopwatch should just fall back to the 64 bit counter. This is another change request.
proper long-term use of Environment.TickCount would be to query it at least every 30 days and maintain a total.
To nitpick, your code is not guaranteed to be scheduled once every 30 days. It could be a suspended VM or a sleeping physical machine. Since the system clock can change solutions based on detecting wrapping that way also fail.
To summarize:
Environment.TickCount64.It is worth noting that I submitted a bug to MS ~6 years ago and was denied with "wont fix" regarding Stopwatch using DateTime due to compatibility reasons. It sure would be nice if their reasoning changed. https://connect.microsoft.com/VisualStudio/feedback/details/622002/low-resolution-stopwatch-behaves-incorrectly
@RobThree - My understanding is that the issues described in KB274323, and in this blog post relating to QPC being unreliable are no longer accurate because 1) The hardware that is known to exhibit this bug is very old and no longer in common usage, and 2) There were some workarounds added to detect this hardware bug and use an alternate timing source. See the notes under TSC Register at the bottom of the Acquiring hi-res timestamps article.
@GSPP - Agreed with 1, 2 & 4. Not sure what else is necessary for 3 that Stopwatch doesn't provide. 5 we'll talk about in the other item. :smile:
Provide a convenient, high-performance monotonic clock.
An additional benefit of a monotonic clock could be correct handling of leap seconds, which DateTime currently can't handle. So any code that currently takes two results of DateTime.UtcNow and compares them may be off by up to a second every few years which is unfortunate for checking short timeouts.
An additional benefit of a monotonic clock could be correct handling of leap seconds, which
DateTimecurrently can't handle. So any code that currently takes two results ofDateTime.UtcNowand compares them may be off by up to a second every few years which is unfortunate for checking short timeouts.
Computers/Programs have a bad time with leap seconds in general. A monotonic clock doesn't actually handle leap seconds so much as it wouldn't notice there was anything different. If you tried to translate it back to a readable date/time value, you'd still get translation errors (unless there was some sort of mapping). DateTime has no ability to deal with leap seconds, though, so unless we add a new full date/time library, I'd recommend just ignoring it.
If you tried to translate it back to a readable date/time value, you'd still get translation errors
Exactly! The proposed API by @GSPP doesn't include a conversion to/from DateTime and so the type system will prevent you from doing just that. So comparing two instances of MonotonicTime could tell you how much metric time elapsed between them independent of how the system clock setting or even UTC behaves. If i'm not mistaken, resetting the clock is how leap seconds are handled on many systems and that is exactly the scenario @GSPP started with - just with a second instead of a year.
And yes, i might be the only one caring about leap seconds..
The proposed API by @GSPP doesn't include a conversion to/from DateTime and so the type system will prevent you from doing just that.
Heh. StartTime + Elapsed still has its own problems, though. We should definitely make use of this internally, and expose it for these types of scenarios, but it's going to need some warnings about blindly using the deltas it generates. When you log, for example, you're almost certainly going to want to take current (UTC) time, and ignore this.
If i'm not mistaken, resetting the clock is how leap seconds are handled on many systems
Not quite sure you're going with this. Ignoring NTP syncing problems, OS/runtime clocks do any or all of the following:
@KrzysztofCwalina what are the next steps here? Are we waiting for a formal API proposal?
I (kind of) made an API proposal. Regardless of whether this ends up being a community contribution this should make it into the framework. The point of this ticket was to make the team aware of a need and a solution. I did not try to apply for implementing this.
@GSPP thanks for pointing out the problem and offering the start of the API shape!
I think we need to think about this some more and come up with the final API proposal that we can submit for review.
Anyone interested in picking this up?
Well, if it's going to be a new API, it should be part of System.Time in CoreFXLab, IMHO. So I guess that means me. 😉
I'm still not clear exactly on the proposed API. Sure - subtracting two instances of MonotonicTime should give a TimeSpan that is as accurate and precise as possible with regards to elapsed time. But what does a single instance of MonotonicTime actually represent? How should it be displayed? Should there be any conversions when interacting with DateTime or DateTimeOffset? Does it need a .Now or similar property or method that is somehow related to the system clock? Or does it just track ticks?
Ultimately, are we just putting a value-type wrapper around Stopwatch ticks? Or are we somehow tying it back to UTC?
If your intention was to track something without leap seconds, like TAI, we would actually need to maintain a table of leap seconds, updating the nuget package whenever new leap seconds were announced. This is possible, but how useful is that? IDK.
@mj1856 in my mind there is no relation to the wall clock at all. This class would never be exposed to the user.
Any "debug" formatting would suffice. Strawman: "12345678 ticks (likely 2017-01-23 18:32:12.345 local time)". This does have a relationship to the wallclock but it's only for the developers convenience.
I don't see the need to convert to wallclock time except as a debugging aid. This is conceptually impossible.
It is possible to approximately convert to wallclock time by storing the current tick count and current DateTime.UtcNow in a static field before the first instance is created (in the cctor). That way an UTC date can be obtained relative to the time the cctor ran. Again, this is a debugging aid. The method could be called MonotonicTime.GetApproximateDateTimeUTC/Local().
I think this would indeed just be a value-type wrapper around Stopwatch ticks, yes. This is a convenience API for a few very common scenarios outlined in the opening post.
Regarding Now, I propose static MonotonicTime Current { get; }.
@GSPP - Oh, so you're just proposing a more strongly typed tick? I suppose that would help with people forgetting to take frequency into account with StopWatch ticks, but other than that, is it just an ease of use issue? I mean, we do have StopWatch.GetTimestamp already.
@mj1856 yes, strongly typed ticks to improve usability to the point where people actually do this instead of hacking it with DateTime.UtcNow (which is broken) and Environment.TickCount which is not handy and overflows (also broken therefore).
I agree with an earlier post that said DateTime.UtcNow is appropriate for the overwhelming majority of callers. Most applications are resilient to the system clock changing by a few seconds here and there. If a system administrator has applications that are not resilient to this, then the sysadmin should disable the system time syncing service.
I don't think we need to worry about the case where a misconfiguration sends the system clock forward or backward by years. That would represent a system-wide misconfiguration and would hopefully be caught very quickly. And the recovery procedure is simple enough: fix the clock and reboot the OS.
IMO this API is niche enough where it doesn't need to be in-box. Anybody who consumes it probably has their own opinionated beliefs regarding how it should behave.
Not necessarily arguing against the niche, but reducing this to the system administrator and a "just reboot" attitude is a bit short sighted.
I'm coming over from https://github.com/dotnet/aspnetcore/issues/13628 where an embedded hardware has an unreliable system clock due to design limitations. Changing it manually or automatically on the order of O(months) is not uncommon and results in unexpected error conditions due to the usage of DateTime.UtcNow within ASP.NET Core. I agree with the OPs sentiment: All of these usages are bugs.
While .UtcNow is orders of magnitude better compared to .Now, I have seen countless real-life issues with programs using the (equivalent of the) latter. Our test engineer has pretty much given up on running any kind of measurements over weekends with a DST change…
If the monotonic clock API is too niche, I would argue that having a dedicated API for timeouts (based on a monotonic time source) is a valuable addition to the framework. Instead of pointing users of .Now to the still-not-fully-correct .UtcNow, this would provide an actual solution to implement timeout correctly under all circumstances.
Here are two alternative designs:
class Environment
{
public static TimeSpan TimeSinceProcessStart { get; }
}
TimeSinceProcessStart would internally be based on stopwatch time stamps. It would be a very simple mechanism. This time value could be useful for other diagnostic purposes. If we want to initialize this lazily then it needs to be renamed ContinuousTime because it no longer starts when the process starts.
class Timeout
{
static Timeout CreateFromDelay(TimeSpan delay);
static Timeout CreateFromDeadline(DateTime elapseTargetDateTimeUtc);
public TimeSpan TimeRemaining { get; }
public bool IsElapsed { get; }
public DateTime ElapseTargetDateTimeUtc { get; } //Computed, can change with clock changes
public IDisposable WhenElapsed(Action action);
public Task WhenElapsedAsync();
public void PropagateTo(CancellationTokenSource cts);
}
I'm personally not convinced that these should be added to the framework. I'm logging the ideas here to collect them.
public DateTime ElapseDateTimeUtc { get; }
Strictly speaking, a date/time instant can't be elapsed, because it's not a duration. You can have a "target" instant (the estimated instant the event will fire), and you can have the instant it _actually_ fired; these are, however, not guaranteed to be the same value. Which of them did you mean?
Things get even crazier in the (admittedly rare) case of the computer clock being manually adjusted for whatever reason (kernels tend to be better behaved about the adjustments they perform on the clocks). For this reason, low-level timeout functionality should almost never make reference to the "current" ("real world") time, but should instead be based and report solely on elapsed process/system time.
I meant the elapse target. The elapse target could change measured in the currently configured PC clock time. It's meant as a supplementary property. Not sure about the use case, I just added it as an idea. This proposal was done rather quickly...
The purpose of a timeout usually is to elapse after a certain absolute wall-clock duration has elapsed. "We have 5 seconds to make this HTTP call or else the user is going to be delayed for too long.". This type should encapsulate such semantics.
The purpose of a timeout usually is to elapse after a certain absolute wall-clock duration has elapsed. "We have 5 seconds to make this HTTP call or else the user is going to be delayed for too long.". This type should encapsulate such semantics.
Right, then I recommend removing it. Referencing wall-clock time is more the domain of a scheduling process, which is a way higher level concern.
How often do your production systems adjust their system clocks so significantly that apps require special handling of this condition?
Edit: this question is in relation to exposing new API. Let's assume for now that we can make Timer and other types more resilient to system clock changes without exposing any new API surface.
How often do your production systems adjust their system clocks so significantly that apps require special handling of this condition?
Edit: this question is in relation to exposing new API. Let's assume for now that we can make Timer and other types more resilient to system clock changes without exposing any new API surface.
Strictly speaking, every piece of hardware will have the clock significantly updated at least once - usually during install, when the clock is first set. In most cases this is going to be the only time such an update happens (barring failure of the clock battery when powered off). Updates from NTP and similar services are usually better behaved, specifically to avoid some of these problems. Changing the _time zone_ of the system does not update the clock (or should not, in a modern OS).
QPC and similar system (or process) time clocks should be immune to time changes, because they measure the relative ticks since a start, and aren't based around an external absolute instant (ie, UTC). Outside of some oddball situations for fallbacks, then, Stopwatch should already be immune, and exposes an Elapsed property.
The question, then, is more whether we want a system/process Elapsed property, which can be based off the same (and is more convenient than having to deal with new-ing something up), like @GSPP has proposed:
class Environment
{
public static TimeSpan TimeSinceProcessStart { get; }
}
Just to note on the frequency. That is actually happening quite often.
In the cloud, you can "pause" instances and there is physical host migration as well. That can cause the system to freeze for a while.
It used to be that we would see clock drift that exceeded 15 minutes over a period of a few months (which caused sync issues with APIs).
If you go to https://time.gov/, you can see your system's offset from the official time. My laptop is off by +0.040 s, for example.
Adjusting the clock is a rare event. But when it happens we don't want random malfunctions. OK, I don't want to overstate the significance of this problem. But it's the kind of totally unexpected bug that takes down, say, Azure in the middle of the night.
It would be nice to have a correct-by-default solution that is clean and convenient.
(I updated the class Timeout slightly.)
Just to note on the frequency. That is actually happening quite often.
In the cloud, you can "pause" instances and there is physical host migration as well. That can cause the system to freeze for a while.It used to be that we would see clock drift that exceeded 15 minutes over a period of a few months (which caused sync issues with APIs).
If you go to https://time.gov/, you can see your system's offset from the official time. My laptop is off by +0.040 s, for example.
These are all instances of absolute time differences. Although normally I would have expected things like host migration to be "invisible", because I would have expected the host clock to be correct due to management requirements. Clock drift over months tells me there's a configuration issue, because NTP updates way more often than that by default. Also, that's a spectacularly bad accuracy - your common wristwatch is something like +-30 seconds a month (and at 15 minutes off, most OSs will start squawking at you for SSL connections).
It would be nice to have a correct-by-default solution that is clean and convenient.
(I updated the class Timeout slightly.)
@GSPP - in what way is the existing Timer class(es) insufficient?
I have misgivings about this method, regardless:
static Timeout CreateFromDeadline(DateTime elapseTargetDateTimeUtc);
... because it's going to be based on a lie (that the timeout is actually based on absolute time).
Most helpful comment
Strictly speaking, a date/time instant can't be elapsed, because it's not a duration. You can have a "target" instant (the estimated instant the event will fire), and you can have the instant it _actually_ fired; these are, however, not guaranteed to be the same value. Which of them did you mean?
Things get even crazier in the (admittedly rare) case of the computer clock being manually adjusted for whatever reason (kernels tend to be better behaved about the adjustments they perform on the clocks). For this reason, low-level timeout functionality should almost never make reference to the "current" ("real world") time, but should instead be based and report solely on elapsed process/system time.