Hi,
I really love your package and happy that add_regressor feature was added. However I faced a strange warning and felt not very comfortable adding several regressors with multiple calls to add_regressor.
Here are two reproducible examples.
The first one is based on tutorial example data
df <- read.csv("https://raw.githubusercontent.com/facebookincubator/prophet/master/examples/example_wp_peyton_manning.csv")
set.seed(42)
# Add random zeros and ones (code from original add_regressors example somehow returns only zeros)
df$nfl_sunday <- sample(c(0, 1), 2905, replace = TRUE)
# Fit model
m <- prophet()
m <- add_regressor(m, 'nfl_sunday')
m <- fit.prophet(m, df)
# Make future data.frame
future <- make_future_dataframe(m, periods = 365)
# Yep I know that is not correct way to add future values in case of randomly generated data
future$nfl_sunday <- sample(c(0, 1), 3270, replace = TRUE)`
In the end I get no trouble. The model is fitted correctly:
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Initial log joint probability = -4.27118
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Now let me try an example with continuous regressor:
df <-
read.csv("https://raw.githubusercontent.com/facebookincubator/prophet/master/examples/example_wp_peyton_manning.csv")
set.seed(42)
# Add random normal
df$nfl_sunday <- rnorm(2905)
# Fit model
m <- prophet()
m <- add_regressor(m, 'nfl_sunday')
m <- fit.prophet(m, df)
# Make future data.frame
future <- make_future_dataframe(m, periods = 365)
# Yep I know that is not correct way to add future values in case of randomly generated data
future$nfl_sunday <- rnorm(3270)
After this I get a warning message, though the model seems to have been fitted:
Initial log joint probability = -4.27118
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
Warning message:
In sort(unique(df[[name]])) == c(0, 1) :
longer object length is not a multiple of shorter object length
add_regressor().Thanks in advance!
Thanks for pointing this out, and for the really clean repro steps. You can safely ignore this warning message. It is in some code checking whether the regressor should be standardized or not, and despite the warning it is doing the right thing. It should however be fixed to not raise the warning :-)
Adding multiple regressors at a time seems nice but the interface might be challenging, since then we'd have to potentially specify the other inputs to add_regressor (prior_scale and standardize) separately for each regressor.
The fix has been pushed to CRAN in v0.2.1
To address the second concern of adding multiple regressors with one call, could a similar approach be implemented to what pandas#sort_values does where a list of fields can be passed to sort by and then the ascending parameter takes a list as well where you can specify the direction of the ordering for each field independently?
df.sort_values(by=['col1','col2'], ascending=[True,False])
May seem like overkill since this can be achieved by looping through a list of regressors. But, maybe there'd be a benefit to pull that into this function?
That's a reasonable approach, I think I see how that would work. I am though still inclined to still prefer just doing it one-at-a-time to keep the interface simple, at the cost of a few more lines of code to get the model set up. For sorting, there is a dependency (ordering) in the values that I think makes it better to do it in one shot, but that isn't really the case for extra regressors.
Most helpful comment
Thanks for pointing this out, and for the really clean repro steps. You can safely ignore this warning message. It is in some code checking whether the regressor should be standardized or not, and despite the warning it is doing the right thing. It should however be fixed to not raise the warning :-)
Adding multiple regressors at a time seems nice but the interface might be challenging, since then we'd have to potentially specify the other inputs to
add_regressor(prior_scale and standardize) separately for each regressor.