Prophet: Measuring feature importance for additional regressors

Created on 16 Nov 2017 · 16Comments · Source: facebook/prophet

Hi,

In one the last releases you've added feature with external regressors. Is there any possibility to estimate their impact on the target variable except training models with them and without them and comparing prediction accuracy on holdout set?

Thanks.

enhancement

Source

zhitkovk

👍1

Most helpful comment

In our specific use case, we have several time series that can roll up into hierarchies. We believe the outcome of one time-series can affect the other. To capture some of the interaction effects across them we want to test several versions of lagged variables of different time series for the target one. Additionally, we have some regressors for each of the time series, which we want to test and see if it interacts with other time series. This results in a lot of variables and we believe that only a few regressors should really trigger for each time series.

I want add that vector autoregression or a hierarchical timeseries approach also seemed promising, but a simple time series approach with regressors seemed to be also worth giving a shot (especially given the robust implementations that are available for the latter :)).

nsriram13 on 23 Nov 2017

👍2

All 16 comments

That certainly sounds like a good approach to me. You could fit with and without, and then use the cross_validate function to get an estimate of the prediction errors under each model.

Besides that... If you are doing MCMC then you could see if the predictive distribution of the component for the additional regressor contains 0. Its _lower and _upper pieces of the interval will be columns in the output of predict. If it does, that would be evidence towards that regressor not being useful. But note that it could still be a useless regressor while being a non-zero component of the model if it adds something that could have instead been captured by another component of the model. For instance an additional regressor that is a straight line could be used in a significant way, while still being redundant to the trend model and so removing it wouldn't affect model performance.

If you're not doing MCMC, then it would be cheaper to just fit the model twice. Is there a particular reason for not wanting to remove the regressor for estimating its importance?

bletham on 17 Nov 2017

Thanks for the quick reply. I got your points about CV and MCMC usage. I was kinda interested in something like interpretation of dependence between target and regressors. I'm not speaking about OLS style of thing with exact effects, but something like positive/negative impact would be cool or as you've mentioned some importance scale like in caret package. (I suppose my lack of understanding of GAM models may play a role in desiring impossible features, but still).

zhitkovk on 17 Nov 2017

That makes sense. In that case just looking at that component in the output of predict would be able to show what portion of yhat is coming from that regressor. With MCMC you could see uncertainty in that, otherwise you would still get a point estimate which could be qualitatively useful.

bletham on 17 Nov 2017

This is a very interesting discussion. I am cross posting this question from another thread I had posted originally on. Is there a way to use spike and slab prior for variable selection using Prophet? Similar to the Bayesian structural time series approach (paper) for getting a parsimonious model?

@bletham: The MCMC approach that you have suggested above seems to follow a similar approach. On reading the docs, it seems like a normal prior is used by default for regressors. But would using a prior that is designed to make a model parsimonious (like a spike and slab) be better? And if so, is it easy to code in? Would love to hear your thoughts.

nsriram13 on 20 Nov 2017

The bulk of the linear component of the model (the X*beta piece) is handling seasonalities. Seasonalities are modeled using fourier series - each column of X is a frequency in the Fourier series, and the betas are then the coefficients. A sparse prior on this component wouldn't be appropriate - it would correspond to the seasonality having a sparse frequency representation. There's no reason to believe that would be the case and I'd expect it to hurt performance.

However, for the columns of X that do correspond to extra regressors, a sparse prior on their betas like spike and slab could make sense. If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it. It would be pretty easy to test out since the prior is contained entirely in the stan code. Basically the prior here:
https://github.com/facebook/prophet/blob/master/R/inst/stan/prophet_linear_growth.stan#L36
would be swapped with a sparse prior, but only for the columns corresponding to extra regressors.

bletham on 22 Nov 2017

nsriram13 on 23 Nov 2017

👍2

@bletham : If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it.

I'm working on a marketing project that tries to identity which of many activities (represented as separate time series) have a significant effect on sales. I'm looking to de-trend y and then see which extra regressors have a significant effect, and then use their co-efficients to tweak the marketing budget a bit.

Feature importance of regressors, f.e. with regressor coefficients, would be a great addition to prophet :)

timvink on 29 Mar 2018

👍1

I'm doing the exact same thing @timvink is, previously used the BRMS approach in R https://cran.r-project.org/web/packages/brms/brms.pdf

roblisy on 25 Apr 2018

Can Someone suggest me how can I do Vector Auto regressive with prophet

nithints on 30 May 2018

@nithints Prophet isn't an autoregressive model, so vector autoregression would not be possible. If you mean more generally just fitting a multivariate time series, this also isn't currently supported but there is an open issue for it in #49 you could follow along in.

bletham on 31 May 2018

Hello, I've read through multiple threads on this topic and have received some great insights. How can I observe the actual beta coefficients? I am working in R, and the [insert model name here]$params$beta will provide the beta coefficients but there is no names just column numbers [1-n]...is there a way to see which name corresponds with each beta coefficient? thanks!

nhernandez05 on 16 Nov 2018

@nhernandez05 I think m$train.component.cols should give you what you want. This is a dataframe. Each row is an entry in beta, and each column is the name of a component. 1 indicates that beta is involved in that component, 0 indicates it is not. For example, by default weekly seasonality uses 6 coefficients, and so there will be 6 rows that have 1 in the 'weekly' column, and those are them.

bletham on 21 Nov 2018

@bletham Would you be able to provide what the Python syntax would be to find these column names for the beta dataframe?

MaxBirdChemEng on 10 Jul 2020

@MaxBirdChemEng it's in m.train_component_cols. It's a pandas dataframe with columns the name of each seasonal component and rows the corresponding entries in m.params['beta']. Row i column j of m.train_component_cols is 1 if beta_i is used in seasonality component j.

bletham on 14 Jul 2020

@bletham : Why weekly seasonlaity has 6 coefficients?