Scikit-learn: In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max

Created on 25 Mar 2018 · 1Comment · Source: scikit-learn/scikit-learn

Description

In Gaussian mixtures, when n_init is set to any value greater than 1, the lower_bound_ is not the max lower bound across all initializations, but just the lower bound of the last initialization.

The bug can be fixed by adding the following line just before return self in BaseMixture.fit():

self.lower_bound_ = max_lower_bound

The test that should have caught this bug is test_init() in mixture/tests/test_gaussian_mixture.py, but it just does a single test, so it had a 50% chance of missing the issue. It should be updated to try many random states.

Steps/Code to Reproduce

import numpy as np
from sklearn.mixture import GaussianMixture

X = np.random.rand(1000, 10)
for random_state in range(100):
    gm1 = GaussianMixture(n_components=2, n_init=1, random_state=random_state).fit(X)
    gm2 = GaussianMixture(n_components=2, n_init=10, random_state=random_state).fit(X)
    assert gm2.lower_bound_ > gm1.lower_bound_, random_state

Expected Results

No error.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
AssertionError: 4

Versions

>>> import platform; print(platform.platform())
Darwin-17.4.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.14.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.0.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.1

Bug

Source