Prophet: Hyperparameter Tuning Snippet

Created on 6 Mar 2020 · 2Comments · Source: facebook/prophet

I'm happy to share with you my hyperparameter tuning proccess:

Requirements: _fbprophet and tqdm_

Imports

import logging 
logging.getLogger('fbprophet').setLevel(logging.ERROR)

from itertools import product
from fbprophet import Prophet

from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics

from tqdm import tqdm

Holidays

def holidays():
    especial = pd.DataFrame({
    'holiday': 'especial',
    'ds': pd.to_datetime([
                            '2019-01-01', '2019-12-25', 
                            '2020-01-01', '2020-12-25'
                            ]),
    'lower_window': 0,
    'upper_window': 0,
    })

    alta = pd.DataFrame({
    'holiday': 'alta',
    'ds': pd.to_datetime([
                            '2019-03-01', '2019-03-06', '2019-03-07', 
                            '2019-05-11', '2019-06-12', '2019-08-10',
                            '2019-11-29'
                            ]),
    'lower_window': 0,
    'upper_window': 0,
    })

    holidays = pd.concat((especial, alta))
    return holidays

List of Params
Create a cartesian product based on these key-values with every parameter combination and produces a list for iteration.

param_grid = {  'growth': ["linear"], 
                'changepoints': [None], 
                'n_changepoints': [25, 50, 75], 
                'changepoint_range': [0.25, 0.5, 0.75],
                'yearly_seasonality': ["auto"],
                'weekly_seasonality': ["auto"],
                'daily_seasonality': [False],
                'holidays': [holidays],
                'seasonality_mode': ["additive"],
                'seasonality_prior_scale': [10, 50, 100],
                'holidays_prior_scale': [10, 50, 100],
                'changepoint_prior_scale': [0.1, 0.33, 0.66],
                'mcmc_samples': [0],
                'interval_width': [0.25, 0.5, 0.75],
                'uncertainty_samples': [0]
              }

args = list(product(*param_grid.values()))
args

Cross-Validation and Performance
It produces a perfomance report based on each parameter combination.

df_ps = pd.DataFrame()

for arg in tqdm(args):
    m = Prophet(*arg[:7], arg[7](), *arg[8:]).fit(df)
    df_cv = cross_validation(m, initial='1000 days', period='30 days', horizon = '30 days')
    df_p = performance_metrics(df_cv, rolling_window=1)
    df_p['params'] = str(arg)
    df_ps = df_ps.append(df_p)

df_ps['mae+rmse'] = df_ps['mae']+df_ps['rmse']
df_ps = df_ps.sort_values(['mae+rmse'])
df_ps

> 100%|██████████| 729/729 [2:36:24<00:00, 12.87s/it]

horizon | mse | rmse | mae | mape | coverage | params | mae+rmse
-- | -- | -- | -- | -- | -- | -- | --
30 days | 0.027732 | 0.166531 | 0.130853 | 0.010886 | 0.0 | ('linear', None, 25, 0.75, 'auto', 'auto', Fal... | 0.297384

Publish a .csv
df_ps.to_csv("search_auto.csv")

Source

HelioNeves

👍2

Most helpful comment

This is great, thanks for sharing!

I did want to comment on a few of the hyperparameters:

changepoint_range is the % of history in which trend changepoints are allowed. It defaults to 80%, meaning there are no trend changes allowed in the last 20% of the time series. This is a heuristic to avoid the situation where there has been a trend change right at the very end of the history without much data past it, where it's easy for the model to overfit to small changes, which are then projected out as being the forecast. So this basically adds some regularization to the final trend, by requiring it to be a value that works well for the entire last 20% of the history. The default of 80% is already fairly conservative (meaning, the 20% held out from changepoints is already a lot). So exploring values less than that probably isn't the best choice, and especially a value like 0.25 seems way too low (that's making it so the trend slope has to be constant for the last 75% of the history). I'd maybe consider something like [0.8, 0.9].
The prior scales: Internally, the time series is normalized to have max value of 1, and the priors are applied in that normalized space. This means that the values for these hyperparameters will typically be less than 1 (and probably more like 0.1). For instance, in the example in the Quickstart, the maximum (absolute) value of the trend change parameters is 0.8, and the maximum absolute value of the seasonality parameters is 0.05. The prior scale is the standard deviation of the Normal prior on those parameters. For seasonality and holiday parameters, the default is 10. This applies essentially no regularization, since they actual take on values in the range of +/- 1. So you likely won't want to explore larger values than that in the sweep. I prefer to use a log spacing for tuning this parameter, and so would use something more like [0.1, 1, 10]. For the trend change prior scale, the default is 0.05, which does apply some regularization. Having some regularization on the trend changes is important to force the model to avoid fitting yearly seasonalty with trend changes. So here I'd use something like [0.005, 0.05, 0.5].
interval_width is the width of the uncertainty interval, which defaults to 0.8 (an 80% interval). This does not affect model fitting at all, just prediction; and it does not affect the prediction of the main estimate yhat at all, just the uncertainty yhat_lower and yhat_upper. It changes those by setting the nominal coverage. So unless you are optimizing for coverage in some way, this shouldn't be tuned at all. For instance in this example of MAE+RMSE, this will not be affected by interval_width.

bletham on 10 Mar 2020

❤1 👍1

All 2 comments

Neat! How about a gist or github notebook for this? It would help make it more accessible.

ddofer on 9 Mar 2020

👍2

This is great, thanks for sharing!

I did want to comment on a few of the hyperparameters:

changepoint_range is the % of history in which trend changepoints are allowed. It defaults to 80%, meaning there are no trend changes allowed in the last 20% of the time series. This is a heuristic to avoid the situation where there has been a trend change right at the very end of the history without much data past it, where it's easy for the model to overfit to small changes, which are then projected out as being the forecast. So this basically adds some regularization to the final trend, by requiring it to be a value that works well for the entire last 20% of the history. The default of 80% is already fairly conservative (meaning, the 20% held out from changepoints is already a lot). So exploring values less than that probably isn't the best choice, and especially a value like 0.25 seems way too low (that's making it so the trend slope has to be constant for the last 75% of the history). I'd maybe consider something like [0.8, 0.9].
The prior scales: Internally, the time series is normalized to have max value of 1, and the priors are applied in that normalized space. This means that the values for these hyperparameters will typically be less than 1 (and probably more like 0.1). For instance, in the example in the Quickstart, the maximum (absolute) value of the trend change parameters is 0.8, and the maximum absolute value of the seasonality parameters is 0.05. The prior scale is the standard deviation of the Normal prior on those parameters. For seasonality and holiday parameters, the default is 10. This applies essentially no regularization, since they actual take on values in the range of +/- 1. So you likely won't want to explore larger values than that in the sweep. I prefer to use a log spacing for tuning this parameter, and so would use something more like [0.1, 1, 10]. For the trend change prior scale, the default is 0.05, which does apply some regularization. Having some regularization on the trend changes is important to force the model to avoid fitting yearly seasonalty with trend changes. So here I'd use something like [0.005, 0.05, 0.5].
interval_width is the width of the uncertainty interval, which defaults to 0.8 (an 80% interval). This does not affect model fitting at all, just prediction; and it does not affect the prediction of the main estimate yhat at all, just the uncertainty yhat_lower and yhat_upper. It changes those by setting the nominal coverage. So unless you are optimizing for coverage in some way, this shouldn't be tuned at all. For instance in this example of MAE+RMSE, this will not be affected by interval_width.