Prophet: How to aggregate daily results to weekly

Created on 29 Jan 2018  路  8Comments  路  Source: facebook/prophet

Dear developers,

I was wondering if you could please advise me on how I could best roll-up the daily predictions to a weekly prediction using fbprophet?

I understand that it's not as easy as taking the sum of the daily values for yhat, yhat_upper and yhat_lower.

I also tried the following (where yhat and yhat_lower are pandas.Series)

from math import sqrt
sqrt((yhat - yhat_lower)**2)

but this gives me quite large ranges for yhat. I'm guessing the main issue here is the fact that yhat(t-1) depends on yhat(t-2).

Thank you!

Most helpful comment

@bletham @bronsonelliott I just did a quick try and it seems like the uncertainty is not bad at all.

# Get forecast
opts = {"daily_seasonality":False, "yearly_seasonality":True, "weekly_seasonality":True}
m = Prophet(**opts)
future = m.make_future_dataframe(periods=33)
forecast = m.predict(future)

# Roll-up to 1 week
samples = m.predictive_samples(future)

# Creates a DF where each column is 1 prediction for that specific day
samples_df = pd.DataFrame.from_records(samples["yhat"])
samples_df['date'] = future['ds']
samples_df['week_of_year'] = (samples_df.date.dt.strftime("%Y")+samples_df.date.dt.strftime("%W")).astype(int)

# The mean of each column is thus our yhat
weekly_predict = samples_df.groupby("week_of_year").sum().mean(axis=1)
weekly_predict = weekly_predict.reset_index()
weekly_predict.rename(columns={0: "yhat"}, inplace=True)
weekly_predict['start_of_week'] = samples_df.groupby("week_of_year").date.min().reset_index().date.tolist()

# Upper and lower values of yhat are computed following fbprophet's approach
upper_lower = samples_df.groupby("week_of_year").sum().reset_index()
weekly_predict['yhat_lower'] = upper_lower.apply(lambda x: np.percentile(x, 10), axis=1).tolist()
weekly_predict['yhat_upper'] = upper_lower.apply(lambda x: np.percentile(x, 90), axis=1).tolist()

The error is around 2% for most of the weeks. This is really cool as the error at the day level is as small. I will use it for the following couple of weeks to see if it continues as smoothly.

My last request to you, if you don't mind, is whether you agree with the approach. If yes, we can definitely close this ticket.

All 8 comments

You can sum the mean estimates (yhat) to get a mean estimate for the week.

For the uncertainty yhat_lower and yhat_upper, these are percentiles and as you note it wouldn't be correct to sum them because of the covariance across time. The right thing to do is
1) get predictive samples. You can do this with samples = m.predictive_samples(future), the same as you would use m.predict. It will return a dictionary, and samples['yhat'] are predictive samples of yhat. It will by default have 1000 samples for each date in future.
2) Roll-up the weeks for each sample. You know have 1000 samples for each week.
3) Take the desired percentile for each week from those samples. By default prophet uses an 80% interval, so that'd be the 10th percentile for yhat_lower and the 90th percentile for yhat_upper.

Does that make sense?

Hi @bletham thanks a lot for your answer! What you say definitely makes sense, I will try it and see how good the results will be. I will update this thread asap

@aalloul please do update this thread once you have more information. I'm definitely interested in this topic as well. I'm pretty new to this Python/statistics world so examples are most definitely appreciated.

@bletham @bronsonelliott I just did a quick try and it seems like the uncertainty is not bad at all.

# Get forecast
opts = {"daily_seasonality":False, "yearly_seasonality":True, "weekly_seasonality":True}
m = Prophet(**opts)
future = m.make_future_dataframe(periods=33)
forecast = m.predict(future)

# Roll-up to 1 week
samples = m.predictive_samples(future)

# Creates a DF where each column is 1 prediction for that specific day
samples_df = pd.DataFrame.from_records(samples["yhat"])
samples_df['date'] = future['ds']
samples_df['week_of_year'] = (samples_df.date.dt.strftime("%Y")+samples_df.date.dt.strftime("%W")).astype(int)

# The mean of each column is thus our yhat
weekly_predict = samples_df.groupby("week_of_year").sum().mean(axis=1)
weekly_predict = weekly_predict.reset_index()
weekly_predict.rename(columns={0: "yhat"}, inplace=True)
weekly_predict['start_of_week'] = samples_df.groupby("week_of_year").date.min().reset_index().date.tolist()

# Upper and lower values of yhat are computed following fbprophet's approach
upper_lower = samples_df.groupby("week_of_year").sum().reset_index()
weekly_predict['yhat_lower'] = upper_lower.apply(lambda x: np.percentile(x, 10), axis=1).tolist()
weekly_predict['yhat_upper'] = upper_lower.apply(lambda x: np.percentile(x, 90), axis=1).tolist()

The error is around 2% for most of the weeks. This is really cool as the error at the day level is as small. I will use it for the following couple of weeks to see if it continues as smoothly.

My last request to you, if you don't mind, is whether you agree with the approach. If yes, we can definitely close this ticket.

Looks good to me, and glad to hear it is behaving nicely.

@aalloul Hi, thank you very much. Can this be extended for months as well ?

If you do this approach are we returning a prediction interval instead of an uncertainty interval? Or, is this process below analogous to interval.width=.8?

weekly_predict['yhat_lower'] = upper_lower.apply(lambda x: np.percentile(x, 10), axis=1).tolist()
weekly_predict['yhat_upper'] = upper_lower.apply(lambda x: np.percentile(x, 90), axis=1).tolist()

@aalloul Hi, thank you very much. Can this be extended for months as well ?

I think yes, you just need to group by month instead of week.

Was this page helpful?
0 / 5 - 0 ratings