Prophet: TypeError: Timestamp subtraction must have the same timezones or no timezones

Created on 28 Jan 2019  路  6Comments  路  Source: facebook/prophet

pandas 0.24 contained some changes to how timezone offsets were handled (I think related to this? http://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#parsing-datetime-strings-with-timezone-offsets) and now I'm getting errors from this line.

I've verified that downgrading to pandas 0.23 fixes the issue.

bug ready

Most helpful comment

Thanks for the help.
In pandas 0.24, this code works:

df = pd.DataFrame({
    'ds': pd.to_datetime([
        '2001-01-01 00:01:10',
        '2001-01-01 00:02:30',
        '2001-01-01 00:03:40',
    ]),
    'y': [1., 2., 1.],
})
m = Prophet().fit(df)
future = m.make_future_dataframe(1)
m.predict(future)

while this code fails with the error that you reported:

df = pd.DataFrame({
    'ds': pd.to_datetime([
        '2001-01-01 00:01:10 UTC',
        '2001-01-01 00:02:30 UTC',
        '2001-01-01 00:03:40 UTC',
    ]),
    'y': [1., 2., 1.],
})
m = Prophet().fit(df)
future = m.make_future_dataframe(1)
m.predict(future)

The difference is whether or not the timezone is specified in df['ds'].

The issue is that make_future_dataframe is dropping the timezones, when it puts them in an np array right here: https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1460

Internally, the model maps time to [0, 1] by tracking the start and end of the history. If those have time zones but the future dates do not, we run into the error.

We'll get this fixed, but in the meantime there are two options for avoiding this issue.

(1) Remove the timezone in df before giving it to Prophet.
If you put

df['ds'] = df['ds'].dt.tz_convert(None)

before fitting the model, everything will work. Make sure that every row in df has the same timezone first.

(2) Create a future dataframe that has the timezone specified.
If instead of using make_future_dataframe you manually construct the future dataframe like

future = pd.DataFrame({
    'ds': pd.date_range(start='2000-01-01', periods=20, freq='H', tz='UTC')
})

and pass that to m.predict, it will work.

All 6 comments

Thanks for bringing this up, and especially to the pointer in the pandas changelog. Do you have a timezone specified already in your dataframe that you input to Prophet? I'm just trying to figure out the scope of this issue. It seems that with pandas 0.24 the quickstart example is still working:

>>> import pandas as pd
>>> pd.__version__
'0.24.0'
>>> df = pd.read_csv('example_wp_log_peyton_manning.csv')
>>> df.head()
           ds         y
0  2007-12-10  9.590761
1  2007-12-11  8.519590
2  2007-12-12  8.183677
3  2007-12-13  8.072467
4  2007-12-14  7.893572
>>> from fbprophet import Prophet
>>> m = Prophet().fit(df)  # works

When I look at m.history (which is the processed dataframe), the timestamps in the ds column have no tzinfo. I'm guessing we may just need to strip timezone from the input if there to resolve this issue. But if you have a simple dataframe that produces the issue that'd be really useful to me.

Looking at the df['ds'].dtype, right before I give it to prophet, I get <M8[ns]. I don't know if that contains time zone info or not. The original data, before I reshape it, comes in with an unnecessary UTC appended like this: 2018-12-18 16:47:10 UTC.

Thanks for the help.
In pandas 0.24, this code works:

df = pd.DataFrame({
    'ds': pd.to_datetime([
        '2001-01-01 00:01:10',
        '2001-01-01 00:02:30',
        '2001-01-01 00:03:40',
    ]),
    'y': [1., 2., 1.],
})
m = Prophet().fit(df)
future = m.make_future_dataframe(1)
m.predict(future)

while this code fails with the error that you reported:

df = pd.DataFrame({
    'ds': pd.to_datetime([
        '2001-01-01 00:01:10 UTC',
        '2001-01-01 00:02:30 UTC',
        '2001-01-01 00:03:40 UTC',
    ]),
    'y': [1., 2., 1.],
})
m = Prophet().fit(df)
future = m.make_future_dataframe(1)
m.predict(future)

The difference is whether or not the timezone is specified in df['ds'].

The issue is that make_future_dataframe is dropping the timezones, when it puts them in an np array right here: https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1460

Internally, the model maps time to [0, 1] by tracking the start and end of the history. If those have time zones but the future dates do not, we run into the error.

We'll get this fixed, but in the meantime there are two options for avoiding this issue.

(1) Remove the timezone in df before giving it to Prophet.
If you put

df['ds'] = df['ds'].dt.tz_convert(None)

before fitting the model, everything will work. Make sure that every row in df has the same timezone first.

(2) Create a future dataframe that has the timezone specified.
If instead of using make_future_dataframe you manually construct the future dataframe like

future = pd.DataFrame({
    'ds': pd.date_range(start='2000-01-01', periods=20, freq='H', tz='UTC')
})

and pass that to m.predict, it will work.

Can confirm that removing the timezone in df before giving it to Prophet works with Pandas 0.24

https://github.com/facebook/prophet/commit/f660264e23135b241cdc4b1e50db71224bc146a3 fixes this by not allowing timezones in the ds column.

I ended up doing this rather than converting them to UTC because it could be confusing to people to put in data with a particular timezone, but then get forecasts in UTC. Better to leave timezone handling to the user.

Pushed to PyPI

Was this page helpful?
0 / 5 - 0 ratings