Scikit-learn: Improve class design for AgglomerativeClustering and FeatureAgglomeration (was pooling_func in AgglomerativeClustering doesn't work)

Created on 28 Sep 2017  Â·  3Comments  Â·  Source: scikit-learn/scikit-learn

Description


pooling_func in AgglomerativeClustering doesn't work.

Steps/Code to Reproduce

from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering


centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
                            random_state=0)

model = AgglomerativeClustering(linkage='complete',
                                connectivity=None,
                                affinity = 'cosine',
                                pooling_func = "test_error",
                                n_clusters=3)
model.fit(X)

Expected Results


Raise error because the pooling_func is not callable. It's a string.

Actual Results


No warning, no error

Versions


Linux-4.4.0-64-generic-x86_64-with-debian-stretch-sid
Python 3.5.3 | packaged by conda-forge | (default, Feb 9 2017, 14:37:12)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

Easy help wanted

All 3 comments

It seems to me like this is a class design issue: FeatureAgglomeration derives from AgglomerativeClustering but pooling_func is only used in AgglomerativeClustering.transform.

Feel free to open a PR. I haven't looked in detail but a possible solution would be to have FeatureAgglomeration and AgglomerativeClustering derive from a common base class and have pooling_func only in FeatureAgglomeration.

Having said that there may be reasons why it was done like this although I can't think of any.

(And FeatureAgglomeration should probably be a wrapper or a mixin...)

On 28 Sep 2017 9:24 pm, "Loïc Estève" notifications@github.com wrote:

It seems to me like this is a class design issue: FeatureAgglomeration
derives from AgglomerativeClustering but pooling_func is only used in
AgglomerativeClustering.transform.

Feel free to open a PR. I haven't looked in detail but a possible solution
would be to have FeatureAgglomeration and AgglomerativeClustering derive
from a common base class and have pooling_func only in FeatureAgglomeration.

Having said that there may be reasons why it was done like this although I
can't think of any.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scikit-learn/scikit-learn/issues/9846#issuecomment-332807654,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6z9lkUF7u0jA2-sWEe_XIZvUk1r_ks5sm4HdgaJpZM4PnBzV
.

I want to take up this issue if not currently being worked upon.

Was this page helpful?
0 / 5 - 0 ratings