Hi there,
I got an issue when I used the seasonal_decompose function in statsmodels package in Python. I am using Python 2.7.13. The data is looks like this, showed below.
date
2016-01-01聽 聽 20.086905
2016-02-01聽 聽 20.071920
2016-03-01聽 聽 20.149253
2016-04-01聽 聽 20.045424
2016-05-01聽 聽 20.049403
2016-06-01聽 聽 20.066260
2016-07-01聽 聽 20.003315
2016-08-01聽 聽 20.022434
2016-09-01聽 聽 20.063003
2016-10-01聽 聽 19.989281
2016-11-01聽 聽 20.005214
2016-12-01聽 聽 20.209121
2017-01-01聽 聽 20.027342
2017-02-01聽 聽 19.941969
2017-03-01聽 聽 20.050094
2017-04-01聽 聽 19.956648
2017-05-01聽 聽 19.969304
2017-06-01聽 聽 19.977306
2017-07-01聽 聽 19.943466
Name: Values, dtype: float64
The code I used is this
import statsmodels.api as sm
res = sm.tsa.seasonal_decompose(data)
resplot = res.plot()
plt.show()
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
And after I access to different components in the model. It shows a lot of NaN, like this:
Don't understand what happened. Find some blogs talks about the parameters setting about freq and filt. Not really understand how to set them up. Also not sure whether it is their issues.
Thank you very much.
Trend component:
date
2016-01-01 NaN
2016-02-01 NaN
2016-03-01 NaN
2016-04-01 NaN
2016-05-01 NaN
2016-06-01 NaN
2016-07-01 20.060979
2016-08-01 20.053083
2016-09-01 20.043537
2016-10-01 20.035706
2016-11-01 20.028669
2016-12-01 20.021626
2017-01-01 20.015425
2017-02-01 NaN
2017-03-01 NaN
2017-04-01 NaN
2017-05-01 NaN
2017-06-01 NaN
2017-07-01 NaN
Name: Values, dtype: float64
Seasonal component:
date
2016-01-01 NaN
2016-02-01 NaN
2016-03-01 NaN
2016-04-01 NaN
2016-05-01 NaN
2016-06-01 NaN
2016-07-01 NaN
2016-08-01 NaN
2016-09-01 NaN
2016-10-01 NaN
2016-11-01 NaN
2016-12-01 NaN
2017-01-01 NaN
2017-02-01 NaN
2017-03-01 NaN
2017-04-01 NaN
2017-05-01 NaN
2017-06-01 NaN
2017-07-01 NaN
Name: Values, dtype: float64
Residual component:
date
2016-01-01 NaN
2016-02-01 NaN
2016-03-01 NaN
2016-04-01 NaN
2016-05-01 NaN
2016-06-01 NaN
2016-07-01 NaN
2016-08-01 NaN
2016-09-01 NaN
2016-10-01 NaN
2016-11-01 NaN
2016-12-01 NaN
2017-01-01 NaN
2017-02-01 NaN
2017-03-01 NaN
2017-04-01 NaN
2017-05-01 NaN
2017-06-01 NaN
2017-07-01 NaN
Name: Values, dtype: float64
Original Data
date
2016-01-01 20.086905
2016-02-01 20.071920
2016-03-01 20.149253
2016-04-01 20.045424
2016-05-01 20.049403
2016-06-01 20.066260
2016-07-01 20.003315
2016-08-01 20.022434
2016-09-01 20.063003
2016-10-01 19.989281
2016-11-01 20.005214
2016-12-01 20.209121
2017-01-01 20.027342
2017-02-01 19.941969
2017-03-01 20.050094
2017-04-01 19.956648
2017-05-01 19.969304
2017-06-01 19.977306
2017-07-01 19.943466
Name: Values, dtype: float64
I think this is all a consequence of your time series being too short. You need at least two and maybe more full cycles to estimate a seasonal pattern.
AFAIR, if the data is monthly, then by default a 12 month annual seasonality is assumed. The freq keyword can be used to override the default seasonal cycle length.
Half a cycle at each end is lost and set to nan because the filter has currently no special handling of endpoints. However, having all nans in the seasonal component comes most likely from the shortness of the time series.
I don't know and never checked what the minimum length for seasonal_decompose is, but unless there are several cycles, there is no way to distinguish seasonal fluctuations from other components including noise.
Hi, is there any possibility not to lose half a cycle at right end?
I need this to check if the right most point (the current measurement) is anomalous or no regarding to history.
Thanks.
Zaven.
@znavoyan This has been changed in the current master. Now trend extrapolation allow optional filling in of the nans at the front and end that were previously lost. #4007 and #4197
It still requires the minimum length to estimate the seasonal component as pointed out in this issue.
seaonal_decompose now requires two complete cycles, which is the same as R.
Most helpful comment
I think this is all a consequence of your time series being too short. You need at least two and maybe more full cycles to estimate a seasonal pattern.
AFAIR, if the data is monthly, then by default a 12 month annual seasonality is assumed. The
freqkeyword can be used to override the default seasonal cycle length.Half a cycle at each end is lost and set to nan because the filter has currently no special handling of endpoints. However, having all nans in the seasonal component comes most likely from the shortness of the time series.
I don't know and never checked what the minimum length for seasonal_decompose is, but unless there are several cycles, there is no way to distinguish seasonal fluctuations from other components including noise.