In Gaussian mixtures, when n_init is set to any value greater than 1, the lower_bound_ is not the max lower bound across all initializations, but just the lower bound of the last initialization.
The bug can be fixed by adding the following line just before return self in BaseMixture.fit():
self.lower_bound_ = max_lower_bound
The test that should have caught this bug is test_init() in mixture/tests/test_gaussian_mixture.py, but it just does a single test, so it had a 50% chance of missing the issue. It should be updated to try many random states.
import numpy as np
from sklearn.mixture import GaussianMixture
X = np.random.rand(1000, 10)
for random_state in range(100):
gm1 = GaussianMixture(n_components=2, n_init=1, random_state=random_state).fit(X)
gm2 = GaussianMixture(n_components=2, n_init=10, random_state=random_state).fit(X)
assert gm2.lower_bound_ > gm1.lower_bound_, random_state
No error.
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
AssertionError: 4
>>> import platform; print(platform.platform())
Darwin-17.4.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.14.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.0.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.1
Thanks for the report and the analysis!
Most helpful comment
Thanks for the report and the analysis!