prediction code into Stan centainly will have a huge speedup!
I make a predict() which takes about 5s, and these 3 sub-function takes about:
predict_trend() # less than 1 s
predict_seasonal_components() # about 2s
predict_uncertainty() # about 3s
Since predict_seasonal_components() takes about 40% time, parallel execuing will sppedup about 40% comparing with in order executing whatever prediction code into Stan or not~
With Python3's concurrent.futures module, writing parallel executing code seems very easy-- just a few lines?
_Originally posted by @leafonsword in https://github.com/facebook/prophet/issues/479#issuecomment-380656086_
It is the loop right here that could potentially be parallelized:
https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1249
There are many resources on how to use multiprocessing to parallelize a loop, this seems like a decent place to start: https://stackoverflow.com/questions/20190668/multiprocessing-a-for-loop
It is the loop right here that could potentially be parallelized:
https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1249There are many resources on how to use multiprocessing to parallelize a loop, this seems like a decent place to start: https://stackoverflow.com/questions/20190668/multiprocessing-a-for-loop
thanks for valuable suggestions but prophet still taking much time.I'm forecasting in loop for second wise data I parallelized the prophet function still it is slower than expected.can you please suggest me how to make prophet to forecast faster than now if possible
Forecast time will depend directly on two things: the number of points for which you are making forecasts (that is, the number of rows in the future dataframe), and the parameter uncertainty_samples.
For the first, make sure that you are only making forecasts at points that you actually need forecasts for. For instance, by default the make_future_dataframe function will include all of the points in the history. If you don't need predictions at all of the points in the history, then you can leave them out and there will be fewer predictions that need to be made.
The parameter uncertainty_samples determines the number of Monte Carlo simulations done to estimate future trend uncertainty. It is by default 1000, to estimate a 80% interval. If you are willing to have a bit more variance in your estimates of the uncertainty, you could reduce that to 100 when you instantiate the Prophet model, like m = Prophet(uncertainty_samples=100). This would make prediction much faster (~5x?). It will not effect the main forecast estimate yhat, but will make yhat_lower and yhat_upper a bit noisier.
I have a big time-series data set. For me, the predict(future) part is the most time-consuming and looks frozen. Is there any way that I can parallelise the process? Any help or suggestion would be much appreciated.
How many rows are in future?
If you don't need uncertainty estimation, then you can now disable that with uncertainty_samples=0 which makes things much faster. The latest version (v0.6) also includes #1311 which speeds up predict. Also be sure that future includes only the dates you actually care about (i.e. you may not need forecasts for every point in the history).
Otherwise, you could get the yhat forecast using uncertainty_samples=0 and then compute uncertainty intervals by making repeated calls to
https://github.com/facebook/prophet/blob/46e56119835f851714d22b285d2e4081853b9fb1/python/fbprophet/forecaster.py#L1357-L1369
to get samples of yhat (the quantiles of which provide uncertainty intervals), and these calls could be parallelized.
Thank you for your response, Ben. My future has 360679 rows.
As for the two suggested solutions:
a. Could you please let me know more information on how to know "which
dates are more important"? Basically, I am forecasting climate change base
on 41 years of data (e.g. temperature). I am not sure if I could follow
what you mentioned about what dates I care about.
b. I know how to parallelise the process of separate weather features (e.g.
temp, pressure, precipitation, etc.). Could you please quickly talk about
how to parallelise the process of sampling yhat?
Many thanks
On Fri, Feb 28, 2020 at 1:52 AM Ben Letham notifications@github.com wrote:
How many rows are in future?
If you don't need uncertainty estimation, then you can now disable that
with uncertainty_samples=0 which makes things much faster. The latest
version (v0.6) also includes #1311
https://github.com/facebook/prophet/pull/1311 which speeds up predict.
Also be sure that future includes only the dates you actually care about
(i.e. you may not need forecasts for every point in the history).Otherwise, you could get the yhat forecast using uncertainty_samples=0
and then compute uncertainty intervals by making repeated calls tohttps://github.com/facebook/prophet/blob/46e56119835f851714d22b285d2e4081853b9fb1/python/fbprophet/forecaster.py#L1357-L1369
to get samples of yhat (the quantiles of which provide uncertainty
intervals), and these calls could be parallelized.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/facebook/prophet/issues/734?email_source=notifications&email_token=AM2U5D7R5D4UQAQCQ6KPUQLRFBU4TA5CNFSM4GDEGTOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGVOJI#issuecomment-592271141,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AM2U5DY3G7RWUAEP4HN3WETRFBU4TANCNFSM4GDEGTOA
.
Yeah that's a relatively large number of points to be forecasting. Under the hood it will construct a 360679x1000 matrix to fill with samples.
I suspect you may be able to reduce that, which would be the best way of making things faster. If you are following the workflow in the docs and doing something like
future = m.make_future_dataframe(365)
then what that will produce is a dataframe future that has every datetime in the history, plus a single point per day moving into the future for the next 365 days.
Let's focus first on the history: based on the number of data points, I'm guessing this is hourly data for 41 years. So future will have a point for every hour of the last 41 years. Do you really need hourly predictions for the entire history? Maybe just making in-sample predictions for the last 5 or 10 years would give you enough of an idea for how the in-sample fit is doing. So you would just construct the future dataframe manually, by directly specifying the datetimes at which you want predictions, like
future = pd.DataFrame({'ds': pd.date_range(start='2015-01-01', end='2021-01-01', freq='H')})
If you really do want to see the hourly predictions for the past 41 years, then you can definitely do that but you would probably want to break it up into shorter periods. For instance, you might construct a separate future dataframe with dates for each year, make predictions on that, and then stitch together all of the forecasts after doing it separately per year. That way each prediction would be on only ~8800 points, which is much more manageable than the 360679. You could parallelize the predicting across each year if necessary.
Does that make sense / work better to break it up to smaller chunks?
This is also related to #50.
Thank you agian, Ben. I'll give a try on those suggestions and let you know. I also found this article useful: Forecasting multiple time-series using Prophet in parallel
Most helpful comment
Forecast time will depend directly on two things: the number of points for which you are making forecasts (that is, the number of rows in the
futuredataframe), and the parameteruncertainty_samples.For the first, make sure that you are only making forecasts at points that you actually need forecasts for. For instance, by default the
make_future_dataframefunction will include all of the points in the history. If you don't need predictions at all of the points in the history, then you can leave them out and there will be fewer predictions that need to be made.The parameter
uncertainty_samplesdetermines the number of Monte Carlo simulations done to estimate future trend uncertainty. It is by default 1000, to estimate a 80% interval. If you are willing to have a bit more variance in your estimates of the uncertainty, you could reduce that to 100 when you instantiate the Prophet model, likem = Prophet(uncertainty_samples=100). This would make prediction much faster (~5x?). It will not effect the main forecast estimateyhat, but will makeyhat_lowerandyhat_uppera bit noisier.