Prophet: How to Calculate & Plot the cross_validation Results?

Created on 31 Mar 2018  路  6Comments  路  Source: facebook/prophet

Can someone provide some example code to show how we can manipulate the "cross_validation" results to calculate the MAPE and plot it against other algorithms? Please see the attached screenshot from the Prophet paper:

image

Most helpful comment

It takes a fitted model just because that was the cleanest way to get all of the information needed to specify the model and do the cross validation. That includes things like the history, specified custom seasonalities or extra regressors. One thing to note is that for seasonalities, cross validation will use whatever seasonalities are used in the final model. Suppose that we have a five year history and yearly seasonality is set to 'auto'. With 5 years of history it will be turned on. Yearly seasonality will then be used for all of the cross validations, even any segments that have <2 years of data and so would typically have yearly seasonality turned off. Basically this makes sure the model features are fixed throughout cross validation.

This does raise a possibility of information leakage, but inside cross validation we use a newly instantiated model that just copies over the model fit settings, from here: https://github.com/facebook/prophet/blob/master/python/fbprophet/diagnostics.py#L126 . We've been careful there to be sure that nothing is being leaked.

All 6 comments

Basically you just need to run it in a loop for a range of values of horizon, and then compute the MAPE for all of the results for each value of horizon. Here is some code that computes MAPE vs. horizon for the example time series in the documentation:

from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation
import pandas as pd
import numpy as np


df = pd.read_csv('../examples/example_wp_peyton_manning.csv')
df['y'] = np.log(df['y'])
m = Prophet()
m.fit(df)

# Compute cross-validation y and yhat for the range of horizons in the figure
df_cv = pd.DataFrame()
for h in [30, 60, 90, 120, 150, 180]:
    df_cv_h = cross_validation(m, horizon='{} days'.format(h), period='100 days', initial='730 days')
    df_cv_h['horizon'] = h
    df_cv = pd.concat((df_cv, df_cv_h))

# Compute absolute percent error for each prediction
df_cv['mape'] = np.abs((df_cv['y'] - df_cv['yhat']) / df_cv['y'])
# mean absolute percent error, by horizon
mape = df_cv.groupby('horizon', as_index=False).aggregate({'mape': 'mean'})

mape.head()
    horizon mape
0   30  0.049973
1   60  0.055482
2   90  0.057295
3   120 0.058952
4   150 0.060597
5   180 0.063632

The v0.3 branch has a utility for generating these plots (https://github.com/facebook/prophet/blob/v0.3/notebooks/diagnostics.ipynb) and will be pushed out soon. I'm going to close this issue and leave #194 as the issue for visualizing model diagnostics.

@bletham : why does cross_validation() function take model fit on the entire time series? Doesn't this lead to information leaking from the validation test to train?

It takes a fitted model just because that was the cleanest way to get all of the information needed to specify the model and do the cross validation. That includes things like the history, specified custom seasonalities or extra regressors. One thing to note is that for seasonalities, cross validation will use whatever seasonalities are used in the final model. Suppose that we have a five year history and yearly seasonality is set to 'auto'. With 5 years of history it will be turned on. Yearly seasonality will then be used for all of the cross validations, even any segments that have <2 years of data and so would typically have yearly seasonality turned off. Basically this makes sure the model features are fixed throughout cross validation.

This does raise a possibility of information leakage, but inside cross validation we use a newly instantiated model that just copies over the model fit settings, from here: https://github.com/facebook/prophet/blob/master/python/fbprophet/diagnostics.py#L126 . We've been careful there to be sure that nothing is being leaked.

@bletham The OP asked for a plot of cross validation results (MAPE in the example he posted) from prophet against the other algorithms. From v0.3 we have the possibility to plot the cross validation results of prophet. I am wondering if there is any plan to include the cross validation results from other algorithms/methodologies (e.g. ARIMA & co.) in future releases.

We don't intend to add additional methods to the package. It's really an implementation of this particular model and not designed as a platform for time series forecasting methods. (Something that would be valuable but not on our roadmap).

Was this page helpful?
0 / 5 - 0 ratings