Prophet: Prophet with weekly aggregate data

Created on 23 May 2019  路  6Comments  路  Source: facebook/prophet

Hi,

I'm using Prophet to predict weekly sales demand and I have a number of factory closure events that influence this demand. These factory closures last a week and are aligned with my weekly sales data. When defining these factory closures in the dataframe that will be passed to my holidays argument in the Prophet constructor:

  • do I specify the first date of these weekly closures in the ds column of the dataframe
  • do I then specify the number of days (or is it weeks, since I'm working with weekly data?) that the factory closure is expected to influence demand either side of the ds date in the lower_window and upper_window columns?

For example if I have a 7-day factory closure starting on '2019-07-07' and I suspect that if influences demand 14 days before the beginning and 14 days after the end of this closure, should I specify:
ds='2019-07-07', lower_window=-14, upper_window=21, OR
ds='2019-07-07, lower_window=-2, upper_window=3 (since I'm working with weekly data)?

Thanks

enhancement

Most helpful comment

That's a good question.

The model is continuous-time. Your assumption is correct, that in order to get the holiday effect on the correct point, you want to give it the date of the data point that it would affect.

As an example, if sales from 5/20/19 through 5/26/19 are rolled up into a single datapoint for the week that is given the date 5/26/19, then any holiday events in that week should be given the date of 5/26/19.

For lower_ and upper_ window parameters it will be a little more complicated. Basically what those parameters do is create additional holidays for dates at those intervals. As an example, if "Christmas" is specified as a holiday with lower_window=-2 and upper_window=1, then under the hood it will create four holidays: Christmas, Christmas_-1, Christmas-2, and Christmas+1. Christmas_-1 is a holiday that has as dates the day before Christmas, and so forth. A separate effect size is fit for each of these holidays.

So in the case of weekly data, if you do lower_window=-14, that will add a holiday for every day working backwards for 14 days. That's clearly not what you want, you want just holiday_-7 and holiday_-14. You'll have to do this manually. For instance, suppose you have a holiday "factory_closure" with dates [3/10/19, 4/14/19]. Then, you could create a holiday "factory_closure_+1wk" with dates [3/17/19, 4/21/19] and that would be the same thing.

As a sid note, if you had different numbers of closures in a week and wanted to somehow account for that, you could define an extra regressor instead of using the holidays interface.

All 6 comments

That's a good question.

The model is continuous-time. Your assumption is correct, that in order to get the holiday effect on the correct point, you want to give it the date of the data point that it would affect.

As an example, if sales from 5/20/19 through 5/26/19 are rolled up into a single datapoint for the week that is given the date 5/26/19, then any holiday events in that week should be given the date of 5/26/19.

For lower_ and upper_ window parameters it will be a little more complicated. Basically what those parameters do is create additional holidays for dates at those intervals. As an example, if "Christmas" is specified as a holiday with lower_window=-2 and upper_window=1, then under the hood it will create four holidays: Christmas, Christmas_-1, Christmas-2, and Christmas+1. Christmas_-1 is a holiday that has as dates the day before Christmas, and so forth. A separate effect size is fit for each of these holidays.

So in the case of weekly data, if you do lower_window=-14, that will add a holiday for every day working backwards for 14 days. That's clearly not what you want, you want just holiday_-7 and holiday_-14. You'll have to do this manually. For instance, suppose you have a holiday "factory_closure" with dates [3/10/19, 4/14/19]. Then, you could create a holiday "factory_closure_+1wk" with dates [3/17/19, 4/21/19] and that would be the same thing.

As a sid note, if you had different numbers of closures in a week and wanted to somehow account for that, you could define an extra regressor instead of using the holidays interface.

Thanks for explaining this so clearly Ben. Makes perfect sense.

Am I correct in saying the same principle applies when adding a country's public holidays to a model working with weekly data? i.e. you can't simply include these holidays with the add_country_holidays(country_name='UK') method, they'd need to be manually created as holidays in the holidays dataframe with their dates corresponding to the commencement date of the weeks you expect them to impact?

For example, if I expect Easter 2019 public holidays (Good Friday = 19/4/2019, Easter Monday=22/4/2019) to impact sales for the 2 weeks commencing Sunday, April 14th and Sunday, April 21st, I'd need to create a holidays dataframe with the rows below to ensure these country holidays are considered as model features?

  • ds='2019-04-14', holiday='easter'
  • ds='2019-04-21', holiday='easter'

Thanks.

Yes, that's correct with the built-in holidays. You'd need to extract them and snap the dates to the aggregation points. This comment gives code to extract the built-in holiday dates: https://github.com/facebook/prophet/issues/806#issuecomment-454484451

That should probably be noted in the documentation (actually it'd be good to have a page generally about forecasting on aggregated data that describes these sorts of caveats). I'll leave that open as an enhancement.

Thanks Ben. When working with this weekly aggregate data and floating holidays (such as Easter), I've snapped the Easter dates back to my aggregation points as shown below for the Easter 2019 dates - Good Friday snapped from April 19th to week commencing Sunday, April 14th and Easter Monday snapped from April 22nd to Sunday, April 21st - and I've added separate 'holidays' 1, 2 and 3 weeks before and after these 2 Easter weeks to enable Prophet to learn the change in ordering patterns either side of these Easter weeks. I've given each week a different holiday name to enable Prophet to differentiate the impact between 3 weeks before Easter vs. 2 weeks before Easter, for example. Do you agree with this approach?

 ds                    holiday
 2019-03-24            easter_3wb
 2019-03-31            easter_2wb
 2019-04-07            easter_1wb
 2019-04-14            good_Friday
 2019-04-21            easter_Monday
 2019-04-28            easter_1wa
 2019-05-05            easter_2wa
 2019-05-12            easter_3wa

Yep that makes sense to me!
I think the only additional thing that I would note is that if there are holidays that always occur in the same week then there will obviously be no way to differentiate their effects and so they could be combined into a single holiday; that isn't the case here but would be for something like Christmas and Christmas Eve over certain year ranges.

This is now described in the documentation https://facebook.github.io/prophet/docs/non-daily_data.html#holidays-with-aggregated-data

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidjayjackson picture davidjayjackson  路  3Comments

andmib picture andmib  路  3Comments

annabednarska picture annabednarska  路  3Comments

dsvrsec picture dsvrsec  路  3Comments

L471 picture L471  路  3Comments