Prophet: Automatically detect if it makes sense to create weekly seasonality or not

Created on 3 Mar 2017  路  7Comments  路  Source: facebook/prophet

Since, creating weekly and yearly seasonality is default feature, it would be good that we check if the data under consideration actually supports weekly seasonality or not. All data which are aggregated at weekly or monthly level will never support weekly and monthly seasonality.

So it's important that we check if data is aggregated at weekly or monthly level, before setting self.weekly_seasonality =True. Agree, it can't be done in class constructor but we can reset the value once we are in fit method and it's pretty easy to use. I have given solution below after providing the error which creeps in because of this.

On weekly or monthly aggregated data if i call m.plot_components(forecast), the error is KeyError: u'the label [Monday] is not in the [index]. This error happens in plot_components() method while executing below code block
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'] y_weekly = [df_s.loc[d]['weekly'] for d in days] y_weekly_l = [df_s.loc[d]['weekly_lower'] for d in days] y_weekly_u = [df_s.loc[d]['weekly_upper'] for d in days]

Now, as weekly or monthly aggregated data will never has these days, plot will always fail, when called with default parameters.

Potential fix:
Step 1: Sort the input dataframe by date, which is being done anyways
Step 2: Get the difference between consecutive date values
Step 3: If min difference is more than 7, then data is not daily and we should set self.weekly_seasonality =False

A sample code snippet with the data and its working

df = pd.DataFrame({'ds': pd.date_range('2017-03-01', periods=10, freq='W'),
                  'y':[1,2,3,4,5,6,7,8,9,10]})
df = df.sort_values('ds')

if np.min(df.ds - df.ds.shift(1)).days >= 7:
    self.weekly_seasonality =False

In above case as data is weekly, we will not create weekly seasonality.

I will create pull request for the same if this problem seems to be one that needs to be addressed

enhancement

Most helpful comment

@datafool I think what you proposed is generally a good approach, I would just want to leave an option for users to override the behavior they want. For instance if all of the observations are 8 days apart and there are 7 weeks of data, then all weekdays will be present. As a more concrete example, the data here are monthly data, but all weekdays are present:
https://github.com/facebookincubator/prophet/blob/master/examples/example_retail_sales.csv

All 7 comments

I agree, and also just posted a comment in that sense:
https://github.com/facebookincubator/prophet/issues/76

Awesome! Thanks for the patch @datafool. Keeping this issue open until we can an R implementation.

I think there are two issues here.

The first is a bug in the plotting which makes it fail if weekly_seasonality=True and there is a weekday that is entirely missing from the data. This should be fixed.

The second is that by default we use both yearly and weekly seasonality, but depending on the data it may not make sense to use them. @datafool thanks for the PR for this. This works by detecting if we have < 1 data point per week, and if so turns weekly seasonality off. I'd like to take a different approach though.

I don't think we should ever override a seasonality that was specifically requested by the user. Weekly seasonality might make sense even when there is < 1 data point per week. They may not be aggregates, and so if they fall on different days each week there could be day-of-week effects that the user wants to include. What we should do is automatically select seasonalities only if the user didn't specifically select which ones to use. I'm not sure exactly what the interface should look like, but maybe instead of defaults yearly_seasonality=True, weekly_seasonality=True we would have yearly_seasonality='auto', weekly_seasonality='auto' and then would have a method that determines for each seasonality that is 'auto' if it should be included or not. For yearly seasonality for instance we should leave it off by default if there is < 1 year of history (maybe even < 2 years).

@bletham I agree with your line of reasoning, I agree that we should not over-write user input and I can surely modify the code to work the way you have requested. However, I made those changes after lot of thought about modeling weekly seasonality and I think my implementation over-writes user input only there are no two records which are within 7 days of each other, so the case where we have days from two different weeks will be taken care of, as difference of days between any two weekday will be less than or equal to 7, so your concern is taken care off.

Now, from modeling perspective if we do not have all days of week, I don't think we should conclude anything about weekly seasonality. So, setting weekly_seasonality=False solves this bug which we observe in plot_components, as well.

However, I will let @seanjtaylor decide how he and his team want to take it ahead!

@datafool I think what you proposed is generally a good approach, I would just want to leave an option for users to override the behavior they want. For instance if all of the observations are 8 days apart and there are 7 weeks of data, then all weekdays will be present. As a more concrete example, the data here are monthly data, but all weekdays are present:
https://github.com/facebookincubator/prophet/blob/master/examples/example_retail_sales.csv

@bletham got your point, I will work on improving this and raise another pull request when ready. But before, I raise another PR I think it would be better to discuss the architecture of v0.2, so that I don't coding in bad fashion!

Was this page helpful?
0 / 5 - 0 ratings