cockroach-cerulean-0001 was recently brought up as the new host after the previous one went down for some reason. maybe maintenance or failure, nothing in the Azure logs so far.
The node quickly logged fatal on clock skew. ntpd is running, and I tried stopping it to run ntpdate, both against the pool we're currently using (ntp.ubuntu.com) and against time.windows.com. Both show ~430ms offset. ntpdate seems to correctly set the hwclock, but it gets reset shortly after.
Here's the graph from the last hour:

The dips down to 0 match the times when I can ntpdate manually.
Not sure there's much we can do other than escale to support, kill the vm, and log this for the continuous saga that is "time sucks".
It is very curious that the ntpdate invocations only resulted in a temporary resetting of the clock offset. I would expect the clock to manually drift upward again after the ntpdate invocation, but instead it jumped right back to the offset it was at previously.
both ntpdate and ntpd are supposed to set the system clock, but they clearly aren't.
Something else is switching the time from under us (beyond the not-so-slow drift).
The last 24h (the machine's been up for about 23) show:

I'll file a support ticket and swap out the machine with another while keeping it around.
Also found the migration in the logs. Well, not the actual logs, it's in "Virtual Machine -> Resource Health -> View History"
This looks like a simple replacement after the existing node went down.
|Start Time | End Time | Status | Description |
|---|---|---|---
|2/26, 9:12 AM | Ongoing | Available | There aren鈥檛 any known Azure platform problems affecting this virtual machine |
|2/26, 8:59 AM | 2/26, 9:12 AM | Unavailable | We're sorry, your virtual machine isn't available because of an unhealthy host |
|2/23, 11:20 AM | 2/26, 8:59 AM | Available | There aren鈥檛 any known Azure platform problems affecting this virtual machine |
|> 14 days ago | 2/23, 11:20 AM | Unknown | We are currently unable to determine the health of this resource |
the last 24h are even more fun:

Quick summary: when running as a guest of hyper-v, it seems we're forced onto the hyperv clock source which resets the system clock every time the host sends a clock message (5 seconds).
Asking for a tsc clock just plain doesn't work, and asking for acpi_pm for some reason still get overridden.
It seems the host time may just become a possible source of time in the future. The pr message seems to indicate that hyperv clocksource and ntp indeed to not play well together: https://patchwork.kernel.org/patch/9525945/
Same thing is currently happening on cockroach-denim-0006. It was also rescheduled due to a bad host:
start time: 3/6, 6:25 AM
end time: 3/6, 6:42 AM
status: Unavailable
description: We're sorry, your virtual machine isn't available because of an unhealthy host
Clock skew over the last 20h (monitoring was broken for a while before that):

cerulean-0002 was rebooted while I was trying to tweak the clocksource. Since then, its clock has been way off:

cerulean-0001 remains bad.
After a bit of back-and-forth with Azure support folks, I've been told the following:
we can unbind the hyperv Time Synchronization device by doing the following:
# Find the ID of the device:
$ curl -O https://raw.githubusercontent.com/torvalds/linux/master/tools/hv/lsvmbus
$ python lsvmbus -vv | grep -w "Time Synchronization" -A 3
VMBUS ID 11: Class_ID = {9527e630-d0ae-497b-adce-e80ab0175caf} - [Time Synchronization]
Device_ID = {2dd1ce17-079e-403c-b352-a1921ee207ee}
Sysfs path: /sys/bus/vmbus/devices/2dd1ce17-079e-403c-b352-a1921ee207ee
Rel_ID=11, target_cpu=0
# Unbind the device, using the device ID displayed above.
$ echo 2dd1ce17-079e-403c-b352-a1921ee207ee | sudo tee /sys/bus/vmbus/drivers/hv_util/unbind
After that, the clock seems to be settable again, manually (eg: date -s ...), and through ntpdate/ntpd.
However, ntp still seems to be drifting more than expected. Will keep checking.
Lovely. Seems like we might need per-cloud provider docs on configuring time synchronization. Cc @jseldess
yeah, this one's weird. It also seems to require a reboot (??) which is odd since that reactivates the driver, and you need to disable it again.
I'm also still waiting on Azure folks to tell me why the host clock seems off, or at least why it's setting a time that's constantly drifting (back and forth, sometimes by as much as +/- 500ms)
cerulean is back up, keeping cockroach-cerulean-000{1,2} which look back in sync, but only time will tell (ah!)
Ditto for cockroach-indigo-0007, it was rescheduled >14 days go (because timestamps are hard!)

time-drift graph shows madness starting around 2017-02-23 17:01
Azure support is saying this is a bug in LIS (Linux integration service for Hyper-v) with a fix in the next release, but not exact ETA on the rollout. Until then, they recommend disabling timesync. I've been doing that on a per-node basis, but I may move this to our node init scripts.
A kernel patch making the hyper-v time sync a separate clock source is merged, and making its way into ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1676635
This should allow the guest VM to use ntp as usual, or use the host time as reference.
@mberhault Is this still a problem with the hyper-v time sync fix?
That fix is more of a work-around. I'm still waiting for the new LIS to roll out. It's supposed to fix this (see a few comments above).
Just added this guidance in https://github.com/cockroachdb/docs/pull/2295.
Thanks for the pointer @mberhault. I'm having this same issue on a Debian 9 machine running Linux 4.9.0-6. Looks like Vitaly's patch made it in too late for me, in 4.10.
I'm running the smallest burst series VM, and am hitting CPU limits due to a cascade effect caused by the time changing every 5 seconds: it gets logged, then several services note the time changed, do their own thing and log it, and then the Azure Linux Agent runs rsyslog to offload the log files for diagnostics, but so much logging is happening that it never catches up and the VM ends up at 100% CPU, and the host eventually kills it. :(
Thanks for the work-around :)
Is there anything else to do here? I'm optimistically closing but @mberhault please check.
Most helpful comment
After a bit of back-and-forth with Azure support folks, I've been told the following:
we can unbind the hyperv Time Synchronization device by doing the following:
After that, the clock seems to be settable again, manually (eg:
date -s ...), and through ntpdate/ntpd.However, ntp still seems to be drifting more than expected. Will keep checking.