Statsmodels: Issue with lilliefors after upgrading to 0.9.0 from 0.8.0 (regression?)

Created on 20 Oct 2018 · 4Comments · Source: statsmodels/statsmodels

I had some code running with v0.8.0, but it won't work anymore with version 0.9.0. The issue can be reproduced simply with the following data and code:

from statsmodels.stats.diagnostic import lilliefors
lilliefors([18.50117949, 18.24272814, 18.50674601])

With 0.8.0 I just got a pandas warning when importing

/usr/lib64/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.                     
  from pandas.core import datetools

But now, I get an error when actually running lilliefors on those data:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/testone/lib64/python3.6/site-packages/statsmodels/stats/_lilliefors.py", line 344, in kstest_fit
    pval = lilliefors_table.prob(d_ks, nobs)
  File "/home/testone/lib64/python3.6/site-packages/statsmodels/stats/tabledist.py", line 120, in prob
    critv = self._critvals(n)
  File "/home/testone/lib64/python3.6/site-packages/statsmodels/stats/tabledist.py", line 99, in _critvals
    return np.array([p(n) for p in self.polyn])
  File "/home/testone/lib64/python3.6/site-packages/statsmodels/stats/tabledist.py", line 99, in <listcomp>
    return np.array([p(n) for p in self.polyn])
  File "/usr/lib64/python3.6/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
    y = self._evaluate(x)
  File "/usr/lib64/python3.6/site-packages/scipy/interpolate/interpolate.py", line 634, in _evaluate
    below_bounds, above_bounds = self._check_bounds(x_new)
  File "/usr/lib64/python3.6/site-packages/scipy/interpolate/interpolate.py", line 663, in _check_bounds
    raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.

I checked the doc, but I did not see any specific instruction on how data should be: is this a regression or am I actually providing wrong data? In the latter case, can you refer me to the relative doc?

Thanks

comp-stats prio-high type-bug

Source

akiross

All 4 comments

My guess is that this is a bug/regression. AFAIR the pvalue handling has been refactored in 0.9.
(It could be a floating point issue at the boundary.)
I will try to investigate later today.

josef-pkt on 20 Oct 2018

(sorry for the delay, slow in switching computers again)
It is a regression if it worked before 0.9.
AFAICS, using the debugger, the minimum sample size for the p-value computation with lilliefors_table.prob(d_ks, nobs) is 4.
The relevant scipy.interpolate.interpolate.interp1d has x values from 4 to 900 (inclusive)

There is also a problem with nobs > 900 but not in all cases

>>> lilliefors(np.random.rand(1500))
(0.064788061921975615, 3.5366480034031793e-16)

>>> lilliefors(np.random.randn(1500))
...
    raise ValueError("A value in x_new is above the interpolation "
ValueError: A value in x_new is above the interpolation range.

>>> np.random.seed(1)
>>> xx = np.random.randn(1500)
>>> lilliefors(xx[:900])
(0.014717319572921217, 0.20000000000000001)
>>> lilliefors(xx[:901])
...
raise ValueError("A value in x_new is above the interpolation "
ValueError: A value in x_new is above the interpolation range.

also lilliefors(xx[:901], pvalmethod='approx') and lilliefors(xx[:901], pvalmethod='table') produce the same exception

definitely a bug introduced during refactoring in 0.9 (assuming it worked before which I haven't checked)

josef-pkt on 25 Oct 2018

👍1

PR that made the change is #3936
I guess the relevant lines are https://github.com/statsmodels/statsmodels/pull/3936/files#diff-6ad3bd17ebe1ff645184564be437e2e1R341

i.e. we switch to table even if pvalmethod='approx' if approx pval > 0.1

another issue is that tabledist should handle "out of support" points
maybe relevant #4355

general:

some hypothesis test require a minimum number of observations, This is usually not checked and enforce and possibly raises an exception at some "arbitrary" code.
if an approximation or table method has only limited support, e.g. for pvalues between 0.01 and 0.1, the we return the boundary with a warning, if outside (recent example in scipy was that extrapolation beyond the supported range causes non-sensical pvalues)
in this case, table has extended values for p-value > 0.1 compared to the 'approx' method. We still want to switch when possible.

maybe switch default to pvalmethod=None which is essentially an 'auto' option to choose the best that we have for a given case.
then 'table' and 'approx' don't have to switch methods.

Note: table and approx are just different ways of computing the same distribution values. It is not like a substantive method like choosing between asymptotic and bootstrap pvalues.

josef-pkt on 25 Oct 2018

👍1

the tables (norm and expon) go only up to pvalue=0.2,
approx formula says intended for p-value<=0.1 but bounds are not checked in pval_lf

get_lilliefors_table has an extension to larger samples, but it is not used in norm. I guess the reason was that the larger sample extension is missing one value, i.e. shape mismatch.
I think we could take just the last value of the smaller sample table, i.e for nobs=900

from f(n)

>>> np.array([0.736, 0.768, 0.805, 0.886, 1.031]) / np.sqrt(900) * 1000
array([ 24.53333333,  25.6       ,  26.83333333,  29.53333333,  34.36666667])

which is close to the smaller nobs table, for nobs=900
25, 26, 28, 30, 35, 42

Instead of extending the table, we could also just use 1-D interpolation on f(n) for n is sample nobs.
This would avoid the need for any upper limit.

I'm not sure what to do for nobs=3.
any p-values are likely large, and the only information we might have is pvalue > 0.1.

I don't see any checking for nobs=2.
Based on recent scipy discussion, I guess the pvalue should be 1. (we estimate mean and var and then there is no degrees of freedom left in normal case.)
https://github.com/scipy/scipy/issues/7730#issuecomment-433062911

josef-pkt on 25 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings