I'm happy to share with you my hyperparameter tuning proccess:
Requirements: _fbprophet and tqdm_
Imports
import logging
logging.getLogger('fbprophet').setLevel(logging.ERROR)
from itertools import product
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics
from tqdm import tqdm
Holidays
def holidays():
especial = pd.DataFrame({
'holiday': 'especial',
'ds': pd.to_datetime([
'2019-01-01', '2019-12-25',
'2020-01-01', '2020-12-25'
]),
'lower_window': 0,
'upper_window': 0,
})
alta = pd.DataFrame({
'holiday': 'alta',
'ds': pd.to_datetime([
'2019-03-01', '2019-03-06', '2019-03-07',
'2019-05-11', '2019-06-12', '2019-08-10',
'2019-11-29'
]),
'lower_window': 0,
'upper_window': 0,
})
holidays = pd.concat((especial, alta))
return holidays
List of Params
Create a cartesian product based on these key-values with every parameter combination and produces a list for iteration.
param_grid = { 'growth': ["linear"],
'changepoints': [None],
'n_changepoints': [25, 50, 75],
'changepoint_range': [0.25, 0.5, 0.75],
'yearly_seasonality': ["auto"],
'weekly_seasonality': ["auto"],
'daily_seasonality': [False],
'holidays': [holidays],
'seasonality_mode': ["additive"],
'seasonality_prior_scale': [10, 50, 100],
'holidays_prior_scale': [10, 50, 100],
'changepoint_prior_scale': [0.1, 0.33, 0.66],
'mcmc_samples': [0],
'interval_width': [0.25, 0.5, 0.75],
'uncertainty_samples': [0]
}
args = list(product(*param_grid.values()))
args
Cross-Validation and Performance
It produces a perfomance report based on each parameter combination.
df_ps = pd.DataFrame()
for arg in tqdm(args):
m = Prophet(*arg[:7], arg[7](), *arg[8:]).fit(df)
df_cv = cross_validation(m, initial='1000 days', period='30 days', horizon = '30 days')
df_p = performance_metrics(df_cv, rolling_window=1)
df_p['params'] = str(arg)
df_ps = df_ps.append(df_p)
df_ps['mae+rmse'] = df_ps['mae']+df_ps['rmse']
df_ps = df_ps.sort_values(['mae+rmse'])
df_ps
horizon | mse | rmse | mae | mape | coverage | params | mae+rmse
-- | -- | -- | -- | -- | -- | -- | --
30 days | 0.027732 | 0.166531 | 0.130853 | 0.010886 | 0.0 | ('linear', None, 25, 0.75, 'auto', 'auto', Fal... | 0.297384
Publish a .csv
df_ps.to_csv("search_auto.csv")
Neat! How about a gist or github notebook for this? It would help make it more accessible.
This is great, thanks for sharing!
I did want to comment on a few of the hyperparameters:
changepoint_range is the % of history in which trend changepoints are allowed. It defaults to 80%, meaning there are no trend changes allowed in the last 20% of the time series. This is a heuristic to avoid the situation where there has been a trend change right at the very end of the history without much data past it, where it's easy for the model to overfit to small changes, which are then projected out as being the forecast. So this basically adds some regularization to the final trend, by requiring it to be a value that works well for the entire last 20% of the history. The default of 80% is already fairly conservative (meaning, the 20% held out from changepoints is already a lot). So exploring values less than that probably isn't the best choice, and especially a value like 0.25 seems way too low (that's making it so the trend slope has to be constant for the last 75% of the history). I'd maybe consider something like [0.8, 0.9].interval_width is the width of the uncertainty interval, which defaults to 0.8 (an 80% interval). This does not affect model fitting at all, just prediction; and it does not affect the prediction of the main estimate yhat at all, just the uncertainty yhat_lower and yhat_upper. It changes those by setting the nominal coverage. So unless you are optimizing for coverage in some way, this shouldn't be tuned at all. For instance in this example of MAE+RMSE, this will not be affected by interval_width.
Most helpful comment
This is great, thanks for sharing!
I did want to comment on a few of the hyperparameters:
changepoint_rangeis the % of history in which trend changepoints are allowed. It defaults to 80%, meaning there are no trend changes allowed in the last 20% of the time series. This is a heuristic to avoid the situation where there has been a trend change right at the very end of the history without much data past it, where it's easy for the model to overfit to small changes, which are then projected out as being the forecast. So this basically adds some regularization to the final trend, by requiring it to be a value that works well for the entire last 20% of the history. The default of 80% is already fairly conservative (meaning, the 20% held out from changepoints is already a lot). So exploring values less than that probably isn't the best choice, and especially a value like 0.25 seems way too low (that's making it so the trend slope has to be constant for the last 75% of the history). I'd maybe consider something like [0.8, 0.9].interval_widthis the width of the uncertainty interval, which defaults to 0.8 (an 80% interval). This does not affect model fitting at all, just prediction; and it does not affect the prediction of the main estimateyhatat all, just the uncertaintyyhat_lowerandyhat_upper. It changes those by setting the nominal coverage. So unless you are optimizing for coverage in some way, this shouldn't be tuned at all. For instance in this example of MAE+RMSE, this will not be affected byinterval_width.