Statsmodels: AIC/BIC inconsistent between AutoReg and ar_select_order

Created on 3 Nov 2020 · 4Comments · Source: statsmodels/statsmodels

AIC and BIC do not appear to be calculated the same way in AutoReg and ar_select_order. Here's an example.

import numpy as np
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.ar_model import AutoReg, ar_select_order

np.random.seed(99999)

coefs = np.array([0.5, -0.25])
y = ArmaProcess(np.r_[1, -coefs]).generate_sample(250)

Then we do model selection.

modsel = ar_select_order(y, maxlag=5, old_names=False)

model.aic has this:

{(1, 2): 0.038243172406288724,
(1, 2, 3): 0.04552812846926287,
(1, 2, 3, 4): 0.0536414426749059,
(1, 2, 3, 4, 5): 0.061296136921122346,
(1,): 0.0735756692249579,
0: 0.26784552200885503}

I believe this means that the AIC for the model with 2 lags is 0.03824...

Estimating the model with 2 lags, I get:

res = AutoReg(y, lags=2, old_names=False).fit()
print(res.summary())

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                      y   No. Observations:                  250
Model:                     AutoReg(2)   Log Likelihood                -353.692
Method:               Conditional MLE   S.D. of innovations              1.007
Date:                Tue, 03 Nov 2020   AIC                              0.047
Time:                        16:56:07   BIC                              0.103
Sample:                             2   HQIC                             0.070
                                  250                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0269      0.064      0.420      0.674      -0.099       0.152
y.L1           0.5051      0.062      8.149      0.000       0.384       0.627
y.L2          -0.2020      0.062     -3.259      0.001      -0.323      -0.080
                                    Roots

The AIC for this model doesn't match what we got above. BIC is also different. I tried all the trend options, and seem to get the closest with n but it's still not quite the same.

Shouldn't these be the same? Or is selection model using different options?

comp-tsa question

Source

stoffprof

All 4 comments

I don't know the code here, so a generic answer.

we want to use the same data when doing lag search. So I guess in ar_select_order the data is truncated to 5 lags even with shorter lags in a model.

you could check res = AutoReg(y[3:], lags=2, ...
or something like this

josef-pkt on 4 Nov 2020

👍1

another possible difference in some model is that we want to use a fast approximate method in specification search, and use the best method for the final model.

josef-pkt on 4 Nov 2020

What @josef-pkt said: Selection requires the LHS variable to be the same, and so the maximum lag length affects the data used to select the model. When you use AutoReg it defaults to the maximum available data. If you want them to be the same you can set hold_back in AutoReg.