Currently, cross_validation() relies on base::as.difftime() to compute time slices. The limitations of the base function is it only rolls up to weekly data.
To allow for monthly and annual dataset, it's ideal to replace with mondate::as.difftime(), which allows units of time 'months' and 'years'.
That seems reasonable to me, and would better match the Python functionality that does have those options.
Glad to hear @bletham :)
Look forward to updates.
I came here to post this exact issue @leungi ! I am glad someone else had picked it up :)
@bletham , I was trying to evaluate one-step ahead monthly forecast on monthly data, by playing with the period and horizon options.
Does the code below make any sense at all as a work-around? I am basically trying to redefine one unit of time as 30.41667 days since we cannot have unit = "month" at the moment.
##Using the example and dataset from the prophet guide
df <- read.csv("C:/Github/prophet/examples/example_retail_sales.csv")
m <- prophet(df, seasonality.mode = 'multiplicative')
## Use all data for fit and make 10 years ahead foreacast
future <- make_future_dataframe(m, periods = 120, freq = 'month')
fcst <- predict(m, future)
plot(m, fcst)
# Try: Out-of-sample forecast - train on 20 years initially (expanding after that),
# forecast 1 month ahead
# Does this here make sense?
tscv.myfit <- cross_validation(m, horizon = 365/12, units = "days", period = 362/12,
initial = 20*12*(365/12))
@APramov what you proposed works great, decimal numbers are not an issue. In the code, these are converted to time difference objects like
as.difftime(horizon, units=units)
so here it would be
as.difftime(365/12, units = "days")
@APramov; kudos for the workaround :+1:
I suppose the drawback is that each month has different number of days?
@bletham Hi Ben, just wondering what's the status of this enhancement for R?
+1
I used the code mentioned here: https://github.com/facebook/prophet/issues/949#issuecomment-487994184
It works, but the horizon returned from performance_metrics are timedelta values, making it difficult to interpret.
I was incorrect above when I stated that the Py version supports monthly/annual options. It uses pd.Timedelta, which is actually even more strict than the R and doesn't support above days.
The reason monthly and annual are not supported for Timedelta/difftime is because they are not fixed units; the length of a month depends on the month. So the logic for subtracting "one month" from a date is going to be quite a bit more complicated than subtracting a fixed amount of time (like a day). So I'm inclined to leave this as-is.
I think the better option for monthly cross validation would be to allow the user to manually specify the locations of the cutoffs, which could then be e.g. all month-end or month-start dates as desired. I'll open a new issue for that.
Most helpful comment
@bletham Hi Ben, just wondering what's the status of this enhancement for R?