Celery: Substantial drift issue

Created on 18 Jan 2014 · 43Comments · Source: celery/celery

Similar to the other "Substantial drift" issues, I am seeing the following:

11:57:32 flower.1 | [W 140118 11:57:32 state:163] Substantial drift from [email protected] may mean clocks are out of sync.  Current drift is
11:57:32 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:32.070727 recv: 2014-01-18 12:57:32.069728]
11:57:32 flower.1 |     
11:57:34 flower.1 | [W 140118 11:57:34 state:163] Substantial drift from [email protected] may mean clocks are out of sync.  Current drift is
11:57:34 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:34.657175 recv: 2014-01-18 12:57:34.656147]
11:57:34 flower.1 |     
11:57:37 flower.1 | [W 140118 11:57:37 state:163] Substantial drift from [email protected] may mean clocks are out of sync.  Current drift is
11:57:37 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:37.047058 recv: 2014-01-18 12:57:37.045609]
11:57:37 flower.1 |     
11:57:39 flower.1 | [W 140118 11:57:39 state:163] Substantial drift from [email protected] may mean clocks are out of sync.  Current drift is
11:57:39 flower.1 |     3600 seconds.  [orig: 2014-01-18 11:57:39.054193 recv: 2014-01-18 12:57:39.052635]

As you can see, the tasks are being created at the correct time, but it is saying it was received an hour later. I suspect it's a DST issue - do we think this is a problem on Flower's end or a Celery issue?

Source

ashchristopher

👍1

All 43 comments

Are you running the latest version? If someone changed the timezone of the machine you may have to restart the worker/flower.

Sent from my iPhone

On Jan 18, 2014, at 5:00 PM, Ash Christopher [email protected] wrote:

Similar to the other "Substantial drift" issues, I am seeing the following:

11:57:32 flower.1 | [W 140118 11:57:32 state:163] Substantial drift from [email protected] may mean clocks are out of sync. Current drift is
11:57:32 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:32.070727 recv: 2014-01-18 12:57:32.069728]
11:57:32 flower.1 |

11:57:34 flower.1 | [W 140118 11:57:34 state:163] Substantial drift from [email protected] may mean clocks are out of sync. Current drift is
11:57:34 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:34.657175 recv: 2014-01-18 12:57:34.656147]
11:57:34 flower.1 |

11:57:37 flower.1 | [W 140118 11:57:37 state:163] Substantial drift from [email protected] may mean clocks are out of sync. Current drift is
11:57:37 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:37.047058 recv: 2014-01-18 12:57:37.045609]
11:57:37 flower.1 |

11:57:39 flower.1 | [W 140118 11:57:39 state:163] Substantial drift from [email protected] may mean clocks are out of sync. Current drift is
11:57:39 flower.1 | 3600 seconds. [orig: 2014-01-18 11:57:39.054193 recv: 2014-01-18 12:57:39.052635]

As you can see, the tasks are being created at the correct time, but it is saying it was received an hour later. I suspect it's a DST issue - do we think this is a problem on Flower's end or a Celery issue?

—
Reply to this email directly or view it on GitHub.

ask on 20 Jan 2014

❤1

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...

I am running the latest celery. The timezone of the box has not been changed. This happens every time flower is run against celery.

ashchristopher on 20 Jan 2014

You should find out if celery.utils.timeutils.utcoffset() returns the correct value for each node

Sent from my iPhone

On Jan 20, 2014, at 5:02 PM, Ash Christopher [email protected] wrote:

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...
I am running the latest celery. The timezone of the box has not been changed. This happens every time flower is run against celery.

—
Reply to this email directly or view it on GitHub.

ask on 21 Jan 2014

This is currently running on my local machine. When I run
celery.utils.timeutils.utcoffset() it returns 0.
On Jan 21, 2014 9:25 AM, "Ask Solem Hoel" [email protected] wrote:

You should find out if celery.utils.timeutils.utcoffset() returns the
correct value for each node

Sent from my iPhone

On Jan 20, 2014, at 5:02 PM, Ash Christopher [email protected]
wrote:

[vagrant@vagrant wave_drop]$ pip freeze
...
celery==3.1.8
flower==0.6.0
...
I am running the latest celery. The timezone of the box has not been
changed. This happens every time flower is run against celery.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com/celery/celery/issues/1802#issuecomment-32888911
.

ashchristopher on 21 Jan 2014

I'm having a similar issue. I'm passing in my whole Django settings.

Celery == 3.1.8
Django == 1.6.X
Flower == 0.6.0

I'm in PST.

[1] % date
Wed Jan 22 15:00:48 PST 2014

In my Django settings, which I'm passing to celery with --config='mysettings', I have:

TIME_ZONE = "America/New_York"

I start flower with the same config:

Current drift is
    10800 seconds.  [orig: 2014-01-22 15:03:12.572711 recv: 2014-01-22 12:03:12.571017]

Even when I try to set:

CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"

The drift exists. However, if I set the TIME_ZONE in my Django settings to America/Los_Angeles, the drift is gone. It seems that the celery worker, rather than flower is the one sensitive to the changing TIME_ZONE setting.

skyl on 23 Jan 2014

I've pasted the relevant info from my settings file.

TIME_ZONE = 'UTC'
USE_TZ = True

CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"

I am in Toronto ("America/Toronto").

ashchristopher on 23 Jan 2014

I wonder if maybe Django modifies the timezone in the time module.

ask on 23 Jan 2014

Note that the event timestamps does not use TIME_ZONE, CELERY_ENABLE_UTC or any of the datetime related settings, as it's using the local timezone of the machine only.

ask on 23 Jan 2014

We're having this problem as well.

Oh, and it absolutely _spams_ logging output to Loggly, and a ton of bandwidth out from Rackspace. :( Does this really need to be logged on every single task, rather than at worker startup?

Please just give us a way to turn this message off.

lukesneeringer on 6 Mar 2014

👍1

I have the same problem.
Running two servers at Digital Ocean with Ubuntu 12.04.3 x64

turbo machine runs Redis
archiver machine runs just workers and are being connected over SSH tunnel with port forwarding (to see redis port on localhost).

The tasks are rather long running, at least 300 seconds each, some are multiple times longer. None task runs longer then 90 minutes.

The "substantial drift" is reported only on that archiver machine.

Using redis 2.2.12 (so a bit older - using the package which comes with my Ubuntu).

Load on both machines is about 8 to 10 - running 3 workers on turbo and 4-5 workers on archiver. Each worker has --autoscale=5,3. The task is is I/O intensive, reading data from AWS S3 buckets and posting them back (so generally HTTP traffic), sometime posting 1GB large file (in smaller chunks).

vlcinsky on 17 Mar 2014

I also get a Substantial drift issue with the following setup:

celery==3.1.10, flower==0.6.0
My machine is on the "Europe/Paris" timezone.

On a regular python shell:

import time
def utcoffset():
    if time.daylight:
        return time.altzone // 3600
    return time.timezone // 3600

returns -2 (we are indeed at UTC+2 at the moment).

With a celery shell, and parameters CELERY_ENABLE_UTC = True and CELERY_TIMEZONE = 'UTC', from celery.utils.timeutils import utcoffset; utcoffset() gives 1.

Events in celery events -d report utcoffset=1

But Flower reports:
Substantial drift from celery@hostname may mean clocks are out of sync. Current drift is 3600 seconds. [orig: 2014-04-10 14:07:51.240687 recv: 2014-04-10 15:07:51.238797]

There does not seem to be any effect if I switch CELERY_TIMEZONE to something else..

I am not particularly annoyed by this, I'm just trying to understand !

NotSqrt on 10 Apr 2014

@NotSqrt Are you using Django? I seem to remember seing Django having some side effect on timezones, maybe it does some patching of sorts?

ask on 10 Apr 2014

I use Django, so yes, that might be it.

NotSqrt on 10 Apr 2014

Okay : If I set django parameter TIME_ZONE to the timezone of my machine, the Substantial drift message disappears. If I set it to something else (UTC, Pacific/Tahiti or somewhere else), Flower reports a current drift of 3600 seconds all the time..

NotSqrt on 10 Apr 2014

It's definitely Django that is doing this:

$ python manage.py shell
>>> from django.conf import settings
>>> settings.TIME_ZONE
'America/Chicago'
>>> import time, datetime
>>> time.timezone
21600
>>> time.altzone
18000
>>> str(datetime.datetime.fromtimestamp(time.time()))
>>> '2014-05-30 07:48:39.947606'

Without Django:

 $ python
 >>> import time, datetime
 >>> time.timezone
 0
 >>> time.altzone
 -3600
 >>> str(datetime.datetime.fromtimestamp(time.time()))
 '2014-05-30 13:49:38.935562'

So django is changing these somehow

ask on 30 May 2014

I should note (for future people who find this issue by googling) is that it can also come up when using 3.1.9 and 3.1.10 together:

>>> import celery
>>> celery.__version__
'3.1.9'
>>> from celery.utils import timeutils
>>> timeutils.utcoffset()
-4

Versus:

>>> import celery
>>> celery.__version__
'3.1.10'
>>> from celery.utils import timeutils
>>> timeutils.utcoffset()
4

wolever on 17 Sep 2014

@wolever, you need to upgrade to a more recent version I think, it was probably fixed later than 3.1.9 at least.

ask on 18 Sep 2014

But right, yeah that you are mixing versions may not be immediately obvious, so good idea to remind people to verify that they have upgraded all the hosts :)

ask on 18 Sep 2014

I still see this

USE_TZ = True
TIME_ZONE = 'Asia/Seoul'
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = 'Asia/Seoul'

celery version 3.1.18
flower 0.8.2

when I run 'date' in my ubuntu box
it shows time in UTC, => I see substantial drift

another ubuntu box with timezone KST => no substantial drift message

pcompassion on 14 Jul 2015

@pcompassion Is ntpd configured correctly?

thedrow on 18 Jul 2015

1. 1. (월) 17:11:06 KST (ubuntu box which shows no warning message)
1. 1. (월) 08:11:45 UTC (ubuntu box with error message)

They look correct.

pcompassion on 20 Jul 2015

I'm still seeing this as well:

WARNING 2015-07-29 15:50:26,590 state 20212 140561769498432 Substantial drift from celery@together-dev may mean clocks are out of sync.  Current drift is
7200 seconds.  [orig: 2015-07-29 15:50:26.504807 recv: 2015-07-29 13:50:26.503507]

Server is UTC with these settings:

CELERY_ENABLE_UTC = True
TIME_ZONE = 'Europe/Stockholm'
USE_TZ = True

antonagestam on 29 Jul 2015

Please include celery version, I fixed an issue with this a long time ago

ask on 23 Oct 2015

I am running Django Django==1.8.1,celery==3.1.18 with NTP on 7 machine.
Each machine is running ~10 workers consuming from different queues, one of the worker is causing drift and the all the workers for all the queues on every machine keeps complaining about drift and the they are not consuming tasks.

Substantial drift from celery@CELERY_ADMINMGRQ_appv may mean clocks are out of sync. Current drift is 17 seconds

Can i exclude one worker and not bring the whole system down ?

shravanmettu on 11 Nov 2015

I believe that the following line is the offender:

https://github.com/celery/celery/blob/def92d580fe7a2c42fcc5f47feed802fd6f7ff48/celery/utils/timeutils.py#L359

According to the Python documentation, time.daylight is non-zero if a DST is defined:

https://docs.python.org/2/library/time.html#time.daylight

However, Celery is treating that value as if it means whether the local time is currently in daylight savings time or standard time. I have a machine set to the America/New_York time zone right now and time.daylight returns 1 even though we are currently in EST not EDT. This causes the utcoffset() method to be off by one in the winter here, which causes adjust_timestamp to be wrong by one hour.

To fix this, I suggest dropping the usage of local time everywhere in Celery and use UTC only for all timestamps. I believe this will greatly simplify things!

bowlofeggs on 4 Dec 2015

NIce catch @rbarlow .
I completely agree with using only UTC. It helps to have something constant when dealing with moving targets like a time (even when it is only timezone offset). I am dealing with traffic information being created and processed in multiple timezones and without keeping timestamps in UTC it would be one big mess.
On the other hand, someone has to dive into the code to do that and this can be challenging amount of work (probably resulting in better working solution being written in less lines of code).

vlcinsky on 4 Dec 2015

I believe this was fixed by https://github.com/celery/celery/commit/85fbe12567eeb6e675c4ca0af963c199cdc2793e

bowlofeggs on 24 Jan 2016

And this patch is part of 3.1.20, so please try upgrading :)

The events don't actually use timestamps for anything but displaying the human time of the event, as fallback ordering, so I just tried finding the simplest backward compatible solution that I could find.

Using pytz/datetime added lots of overhead which was a problem when we have to process 4 times as many events, as there are tasks in the cluster.

ask on 26 Jan 2016

On 3.1.20 I sometimes get e.g. Substantial drift from celery@saver may mean clocks are out of sync. Current drift is 17 seconds. [orig: 2016-03-02 20:31:10.494412 recv: 2016-03-02 20:30:53.695555] too (as on 3.1.19). Host timezone is UTC. ntp is installed and ntpdate -q pool.ntp.org shows offset 0.002794 sec

fillest on 2 Mar 2016

me too ! @fillest +1

Aug-G on 30 Mar 2016

@ask This won't go away eh?

thedrow on 31 Mar 2016

adding,

USE_TZ = False
TIME_ZONE = 'Asia/Kolkata'

to django settings and adding,
CELERY_TIMEZONE = 'Asia/Kolkata'
then restarting celery and uwsgi, solved the issue for me

webmakin on 25 Apr 2016

It doesn't matter if the clocks are synced by NTP actually. The clock drift is based on the timestamp in the message, and the time we received it.

So if the worker is not able to keep up with the events, and process them too late, the drift warning will be produced.

You should set the CELERY_EVENT_QUEUE_TTL, CELERY_EVENT_QUEUE_EXPIRES setting to avoid this (these will be enabled by default in 4.0)

ask on 24 Jun 2016

👍1

We also noticed this to happen today with celery version 3.1.23 !!!
celery (3.1.23)

2017-01-25 11:36:07,506: WARNING/MainProcess] Substantial drift from [email protected] may mean clocks are out of sync. Current drift is
842 seconds. [orig: 2017-01-25 11:36:07.015840 recv: 2017-01-25 11:22:05.133577]

w-A-L-L-e on 25 Jan 2017

Did you try upgrading to 4.x?

thedrow on 30 Jan 2017

I had the same issue, but changing $ celery flower to $ celery proj flower fixed it.

SergiiShcherbak on 23 Mar 2017

Hi,
I have the same issue unfortunately...

Celery version: 4.0.2
Django version: 1.10.6

USE_TZ = True
TIME_ZONE = 'Israel'
enable_utc = True
timezone = 'Israel'

[2017-06-21 14:52:59,483: WARNING/MainProcess] Substantial drift from celery@2fcd73b4-6bf3-4f39-8e8b-f157d4548bf5 may mean clocks are out of sync. Current drift is 10800 seconds. [orig: 2017-06-21 14:52:59.483377 recv: 2017-06-21 17:52:59.470257]

maxandron on 21 Jun 2017

👍1

@maxandron The information provided isn't enough to debug this issue. Please open a new issue with the necessary information for us to debug this problem.

thedrow on 9 Jul 2017

@thedrow Thank you for the reply, what kind of information will be helpful?

maxandron on 9 Jul 2017

@maxandron See the checklist when opening a new issue.
Also, please include any ntpd configuration you have.

thedrow on 11 Jul 2017

I got this warning and can't fix with set timezone, finally I found that I only restart the worker but didn't restart the web app...

hotea on 1 Sep 2018

It doesn't matter if the clocks are synced by NTP actually. The clock drift is based on the timestamp in the message, and the time we received it.

So if the worker is not able to keep up with the events, and process them too late, the drift warning will be produced.

You should set the CELERY_EVENT_QUEUE_TTL, CELERY_EVENT_QUEUE_EXPIRES setting to avoid this (these will be enabled by default in 4.0)

@ask . How to ensure that the worker can keep up with the events? This happened to me when I scaled my worker to 100 and up.. Should I increase the resource of my RabbitMQ that serves as a broker on my celery workers?

I am using celery 4.2.0