Statsmodels: Goodness of fit test for Weibull with unknown parameters

Created on 27 Aug 2019 · 2Comments · Source: statsmodels/statsmodels

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I would like to be able to test data to see whether it comes from a Weibull distribution with unknown parameters

Describe the solution you'd like

A clear and concise description of what you want to happen.

According to https://en.m.wikipedia.org/wiki/Anderson%E2%80%93Darling_test

“Any other family of distributions can be tested but the test for each family is implemented by using a different modification of the basic test statistic and this is referred to critical values specific to that family of distributions. The modifications of the statistic and tables of critical values are given by Stephens (1986)[2] for the exponential, extreme-value, Weibull, gamma, logistic, Cauchy, and von Mises distributions. Tests for the (two-parameter) log-normal distribution can be implemented by transforming the data using a logarithm and using the above test for normality. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published by Pearson & Hartley (1972, Table 54). Details for these distributions, with the addition of the Gumbel distribution, are also given by Shorak & Wellner (1986, p239). Details for the logistic distribution are given by Stephens (1979). A test for the (two parameter) Weibull distribution can be obtained by making use of the fact that the logarithm of a Weibull variate has a Gumbel distribution.”

There is already a test for the normal distribution with unknown parameters https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.normal_ad.html#statsmodels.stats.diagnostic.normal_ad and a separate KS based one for normal and exponential random variables https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.lilliefors.html#statsmodels.stats.diagnostic.lilliefors

It would be great to add Weibull to the list.

Describe alternatives you have considered

A clear and concise description of any alternative solutions or features you have considered.

There is an R package that has a KS based test https://cran.r-project.org/web/packages/KScorrect/index.html . An Anderson Darling based one might be preferable however.

Additional context

Add any other context about the feature request here.

comp-stats type-enh

Source

lesshaste

Most helpful comment

Dear all,

I see an option for this Issue.

One could evaluate the p-value of the Lilliefors statistical distance (D) for other theoretical probability families (i.e.: weibull, gamma, etc.) through a simulation/boostrap approach.

I have developed that algorithm in a personal module. I annexing it below for future reference. Feel free to use. If possible, check the module, and if it is correct, I ask to update the scipy.stats.lilliefors test to a version like the annex below.

Sincerely,

Philipe Riskalla Leal
Remote Sensing - Instituto Nacional de Pesquisas Espaciais Brazil

Lielliefors_Goodness_of_fit_test_for_general_family_distribution.zip

PhilipeRLeal on 29 Jan 2020

👍2

All 2 comments

I haven't had time to look this up yet. from what I remember

Weibull has a shape parameter and critical values and p-values for ad and ks tests can be tabulated only for location-scale families. Maybe the two parameter weibull can be tabulated.

AFAICS, the R package uses simulated critical values which would depend on the specific parameters.

Stephens in his old articles mentions a few distributions that are not location-scale families, but I don't remember those details. I never looked carefully at those cases. I didn't find anything in the sandbox related to weibull. Implementation for several gof tests based on Stephens articles are still in sandbox.distributions, but I only looked at the main loc-scale families.

It would be possible to add simulated pvalues for gof tests with general distributions. That would require both estimation methods for the parameters and simulation of the parameterized distribution, those are available in scipy.stats for many distributions in the non-regression setting.

josef-pkt on 4 Sep 2019

👍1