Having a total number of samples which is large results in an overflow warning and nonsensical results. It's easy to fix by changing 'totaln = np.sum(n)' to 'totaln = np.sum(n,dtype='uint64')' in the stats.kruskal() definition. I'll submit a fix shortly.
This produces an error and prints the wrong pvalue:
import numpy as np
from scipy import stats
[statistic, pvalue] = stats.kruskal(np.random.randn(25000),np.random.randn(25000))
print(pvalue)
This produces no error:
import numpy as np
from scipy import stats
[statistic, pvalue] = stats.kruskal(np.random.randn(20000),np.random.randn(20000))
print(pvalue)
This produces an error and prints the wrong pvalue:
import numpy as np
from scipy import stats
[statistic, pvalue] = stats.kruskal(np.random.randn(20000),np.random.randn(15000),np.random.randn(15000))
print(pvalue)
This produces no error:
import numpy as np
from scipy import stats
[statistic, pvalue] = stats.kruskal(np.random.randn(15000),np.random.randn(15000),np.random.randn(15000))
print(pvalue)
stats\stats.py:5056: RuntimeWarning: overflow encountered in long_scalars
h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3 * (totaln + 1)
<<Output from 'import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)'>>
0.19.0 1.12.1 sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
Is there a fix to this issue ? It is two years old now ...
Yes @avihaleva, there was a fix in #7763 but for some reason it has not been merged. You can fix this issue yourself by editing the stats.py file and replacing totaln = np.sum(n) with totaln = np.sum(n, dtype='uint64') on line 5931.
This issue still exists. I just re-installed latest version of scipy and the code still exists (now, found on line 5878):
totaln = np.sum(n)
@petereliason 1.4.0 is not released yet
Most helpful comment
Yes @avihaleva, there was a fix in #7763 but for some reason it has not been merged. You can fix this issue yourself by editing the stats.py file and replacing
totaln = np.sum(n)withtotaln = np.sum(n, dtype='uint64')on line 5931.