Prophet: [R] Prophet stalls in fit.prophet

Created on 6 Feb 2019 · 14Comments · Source: facebook/prophet

Hi, we regularly experience cases where Prophet stalls while fitting the model and the only solution seems to be to kill the entire R process. This is a bit inconvenient in large automated runs as everything gets blocked and lost.

I'm aware that the reprex shown below is not necessarily most suitable for the use of Prophet as it's very short and low-count data. But my question is more general:
Is it possible to modify Prophet such that

either it aborts with an error in these cases
or it can be forcefully stopped with R interrupts when it's stuck? This would enable the use of R.utils::withTimeout() or setTimeLimit().

But as it presumably happens within STAN I'm not sure how much you can do about it..

It seems to be some exotic, non-stochastic (i.e. reproducible) numeric singularity and happens very rarely but rare event * many opportunities = significant effect also.

Minimal Reprex

library(prophet)

xreg_train <- c(
  -0.76537379485893120012,
  -0.74533153677032704109,
  -0.62494542934172692127,
  -0.39778106465657125934,
  -0.06976829815746909968,
   0.34070346369915599504,
   0.80422096949408772292,
   1.28317620368430906907,
   1.73567381412022903042,
   2.12006230186317567998
)

df_train <- data.frame(ds = seq.Date(from = as.Date("2018-09-10"),
                                     by = "weeks",
                                     length.out = 10),
                       y = c(1, 1, 0, 1, 0, 1, 0, 2, 2, 0), 
                       xreg = xreg_train
                       )

prophet_model <- prophet(seasonality.mode = "multiplicative", 
                         n.changepoints = 2)

prophet_model <- add_regressor(prophet_model, "xreg")

prophet_model <- fit.prophet(prophet_model, df_train)

The same call to fit.prophet _doesn't_ stall if we change one of the following very minor things:

Round regressor xreg_train to 15 digits: xreg_train <- round(xreg_train, 15). (Rounding to 16 digits still stalls).
Change last value in y from 0 to 1: y = c(1, 1, 0, 1, 0, 1, 0, 2, 2, 1)
Change n.changepoints from 2 to 1, 3, 4, 5 or 6
Change seasonality.mode to "additive"
Leave out regressor (but we've also had cases without regressors that have stalled)

SessionInfo

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] prophet_0.4  rlang_0.3.1  Rcpp_0.12.17

loaded via a namespace (and not attached):
 [1] pillar_1.2.3       compiler_3.5.0     plyr_1.8.4         bindr_0.1.1        R.methodsS3_1.7.1  R.utils_2.6.0      tools_3.5.0       
 [8] jsonlite_1.5       tibble_1.4.2       gtable_0.2.0       pkgconfig_2.0.1    bigrquery_1.0.0    DBI_1.0.0          yaml_2.1.19       
[15] parallel_3.5.0     bindrcpp_0.2.2     gridExtra_2.3      dplyr_0.7.5        httr_1.3.1         stats4_3.5.0       grid_3.5.0        
[22] tidyselect_0.2.4   glue_1.2.0         inline_0.3.15      R6_2.2.2           rstan_2.17.3       purrr_0.2.5        ggplot2_2.2.1     
[29] magrittr_1.5       scales_0.5.0       codetools_0.2-15   StanHeaders_2.17.2 assertthat_0.2.0   colorspace_1.3-2   config_0.3        
[36] lazyeval_0.2.1     munsell_0.5.0      R.oo_1.22.0

Thanks!

bug ready

Source

lmaag

All 14 comments

This is great, thanks for the really clean repro.

We ran into a few issues like this in v0.2 where it gets stuck on the Stan side of things. It's always with very small datasets and I believe is happening because the model parameters are not very well specified by so little data. We made a few adjustments in v0.3 that got rid of all of the cases I knew of at the time, so it's great to have a new example so I can see what adjustments we can make to avoid this in the future. It is a bit of a challenge as you note since it's happening inside Stan so we're not really able to catch it, but we can see how we can adjust the default settings and how we pass data into Stan to avoid it.

In v0.2 when we saw this, it could always be avoided by using Stan's Newton solver instead of the L-BFGS. That is the case here too: Just use

prophet_model <- fit.prophet(prophet_model, df_train, algorithm='Newton')

For big datasets Newton is quite a bit slower, hence we use L-BFGS by default, but it does seem to be more robust. I'm leaning now towards making an update to default to Newton if the number of datapoints is below some threshold.

bletham on 11 Feb 2019

👍1

Thanks for your reply, Ben!
I've changed my calls to Prophet such that it uses the Newton solver for small datasets. I'll test this modification "in the wild" and get back to you in a couple of days with a conclusion on the impact on the stall-rate. Maybe this will help your decision-making regarding the update of defaults.

lmaag on 14 Feb 2019

👍1

To conclude: We're working with weekly data. I've changed the call to Prophet such that it uses the Newton solver for time-series shorter than 1 year (i.e. < 52 data points) instead of the default L-BFGS.
So far it is looking very good, no more stalling has occured and there isn't any noticeable impact on performance. Of course this is not proof positive that it won't happen anymore at all, but it certainly indicates that it reduces the problem a lot (and probably even eliminates it completely).
Thanks!

lmaag on 25 Feb 2019

Great, thanks for the update!

bletham on 1 Mar 2019

I'm going to go ahead and leave this open until the issue is fixed.

bletham on 7 Mar 2019

Hi, I am running in to the same problem. I can share a reproducible example due to proprietary data, but for me it seems to be an arbitrary combination of model tunings and datapoints that triggers it. It is not alway the same model tuning that causes the problem and not always the same amount of data points (small vs big). So I am very much in need of a fix.

mikkelkrogsholm on 17 Apr 2019

@mikkelkrogsholm does algorithm='Newton' fix the issue as described above?

bletham on 17 Apr 2019

@bletham I have tried it. It solves some, but not all of the issues. I still run into problems. I am running a pipeline that (for this test) is building 400K models. I am using the furrr package to parallelize the work onto a lot of cores. The problem I am facing is that when I run into this issue, then the "future" from the furrr package never resolves it self and just hangs - forever. Even if I change to Newton.

And besides that. Since I don't know which model / data combination is causing this hick-up then I would need to run Newton on all models in the pipeline, which slows it down considerably.

mikkelkrogsholm on 18 Apr 2019

If you are able to post a dataset that produces this issue with the Newton optimizer that would be very helpful for me to try and find the right fix.

In the meantime, the next version I will put a rule that defaults to Newton if there are less than X datapoints, where X will probably be 100. Any counterexamples to that rule would be helpful.

Generally, you could try and reduce the complexity of the model which will make the optimization more robust. Disabling seasonalities that are unlikely to be present and limiting the number of changepoints to be quite a bit less than the number of datapoints are the two most immediate things to do.

bletham on 22 Apr 2019

Pushed to CRAN

bletham on 22 May 2019

👍1

@humphreyapplebee thanks for reporting this, I'm able to replicate it. Well actually for me the L-BFGS doesn't freeze but it does exit with an error since it starts getting log probabilities of NaN, and then it automatically retries with Newton which also runs into NaNs and throws the B[1] error that you report. I opened a new issue for this (#1032) since it is in Py and it may be confusing to people in the future to have it here in an issue that is otherwise all about R.

bletham on 22 Jun 2019

Hi @bletham

We are facing the fit.prophet stalling issue even with a larger data set. We have a training data of 820 points (27 months) and that also gets stuck sometimes (this is the first time it got stuck in production where the model has been running for past 6 months smoothly). It is getting stuck at the same iteration. We are generating the holiday configuration using genetic algo for the cost optimization function which internally calls fit.prohet to generate the model and calculates the error.
We have seen similar issues happening during testing and optimization phase of experiments but we thought it could be some other issue. Now it seems that it is pointing to this discussion.

We are not using any regression and the seasonality.mode is additive.

Could you please suggest some resolution?

Vibs-DataScience on 22 Aug 2019

@Vibs-DataScience is there any chance you can upload data that produces the error? Generally the issue seems to be overparameterization of the model. With few data points, the changepoints easily become overparameterized. With 820 points that's not very likely the case but I wonder if something weird might be happening with the holidays during the search. Do you have any instances of completely redundant holidays? For instance, having "Christmas Eve" as a holiday, and separately "Christmas" with a lower window of -1 would place two separate holidays that always occur on Dec 24 ("Christmas Eve" and "Christmas -1"). Is something like that happening in the case where it stalls?

bletham on 5 Sep 2019

Also, have you tried using Newton optimizer?

bletham on 5 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings