Hdbscan: 'wminkowski' metric not working - weight array not passing through properly to the distance function

Created on 30 May 2017  Â·  10Comments  Â·  Source: scikit-learn-contrib/hdbscan

Hi - I'm trying to use the weighted minkowski metric:

   clusterer = hdbscan.HDBSCAN(algorithm='best', alpha=1.0, approx_min_span_tree=True,
    gen_min_span_tree=False, leaf_size=40, 
    metric='wminkowski', min_cluster_size=20, min_samples=5, 
    core_dist_n_jobs=1, p=2,  w=[2, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.1, 0.1])

hdb_clusters = clusterer.fit(data)

Here is the error I get - it looks like the p and w arguments are not passing through all the way:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-174-0c6dd011773f> in <module>()
----> 1 hdb_clusters = clusterer.fit(data)

/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in fit(self, X, y)
    862          self._condensed_tree,
    863          self._single_linkage_tree,
--> 864          self._min_spanning_tree) = hdbscan(X, **kwargs)
    865 
    866         if self.prediction_data:

/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
    604                                                approx_min_span_tree,
    605                                                gen_min_span_tree,
--> 606                                                core_dist_n_jobs, **kwargs)
    607 
    608     return _tree_to_labels(X,

/usr/local/anaconda/lib/python3.5/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs)
    281 
    282     def __call__(self, *args, **kwargs):
--> 283         return self.func(*args, **kwargs)
    284 
    285     def call_and_shelve(self, *args, **kwargs):

/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in _hdbscan_boruvka_balltree(X, min_samples, alpha, metric, p, leaf_size, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, **kwargs)
    313         X = X.astype(np.float64)
    314 
--> 315     tree = BallTree(X, metric=metric, leaf_size=leaf_size, **kwargs)
    316     alg = BallTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
    317                                    leaf_size=leaf_size // 3,

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:9223)()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4824)()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.WMinkowskiDistance.__init__ (sklearn/neighbors/dist_metrics.c:7811)()

TypeError: __init__() takes exactly 2 positional arguments (0 given)
bug help wanted

All 10 comments

That is a little disconcerting. There had previously been some issues with arguments to metric not getting passed through, but those got corrected. I'll have to look into exactly what is going wrong here.

I have just come across this also. The problem is with using weighted minkowski distance with the BallTree algorithm for nearest neighbour searching.

tree = BallTree(X, leaf_size=300, metric='wminkowski', w=w)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-bc9902aad526> in <module>()
      1 X = np.array(df10.values)
      2 p=10
----> 3 tree = BallTree(X, leaf_size=300, metric='wminkowski', w=w)#, #metric_params={'w': w, 'p' : p})

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:9223)()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4824)()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.WMinkowskiDistance.__init__ (sklearn/neighbors/dist_metrics.c:7811)()

TypeError: __init__() takes exactly 2 positional arguments (0 given)

Thanks for the catch -- this looks like an upstream issue in sklearn then. I'll have to push it up to them.

I get a similar when using NearestNeighbors:

nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="wminkowski", p=1, w=weights).fit(train_set)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-19acd55f47d9> in <module>()
      8 
      9 # find the k neighbors
---> 10 nbrs = NearestNeighbors(n_neighbors=k+1, metric="wminkowski", p=1, w=weights).fit(train_set)
     11 knn_distances, knn_idx = nbrs.kneighbors(train_set)

~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/unsupervised.py in __init__(self, n_neighbors, radius, algorithm, leaf_size, metric, p, metric_params, n_jobs, **kwargs)
    121                           algorithm=algorithm,
    122                           leaf_size=leaf_size, metric=metric, p=p,
--> 123                           metric_params=metric_params, n_jobs=n_jobs, **kwargs)

TypeError: _init_params() got an unexpected keyword argument 'w'

I believe sklearn has refactored all of this and now has an extra argument that is a dictionary of metric_params. This is definitely a better way to do this, so I will eventually have to do the required refactoring to match. A PR would be happily accepted if anyone else would like to do the work -- it's fairly straightforward.

So here is what I did instead:

params = {'w': [1,1,1,1,1]}
nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="minkowski", p=1, metric_params = params).fit(train_set)

Yet the error persists:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-75873cd0d4a1> in <module>()
      8 
      9 # find the k neighbors
---> 10 nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="minkowski", p=1, metric_params = params).fit(train_set)
     11 knn_distances, knn_idx = nbrs.kneighbors(train_set)

~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
    801             or [n_samples, n_samples] if metric='precomputed'.
    802         """
--> 803         return self._fit(X)

~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in _fit(self, X)
    242             self._tree = BallTree(X, self.leaf_size,
    243                                   metric=self.effective_metric_,
--> 244                                   **self.effective_metric_params_)
    245         elif self._fit_method == 'kd_tree':
    246             self._tree = KDTree(X, self.leaf_size,

sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__()

sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric()

TypeError: __init__() got an unexpected keyword argument 'w'

Looking at the documentation however, it seems that wminkowski might no longer be supported:

Valid values for metric are:

  • from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]
  • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

Any news on this?

There's note too much I can do about unsupported metrics upstream unfortunately -- at least not without ending up with more maintenance work long term than I am comfortable taking on right now. Sorry. :-(

Fair enough, cheers!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

esvhd picture esvhd  Â·  7Comments

architec997 picture architec997  Â·  14Comments

learningbymodeling picture learningbymodeling  Â·  8Comments

mickohara23 picture mickohara23  Â·  10Comments

s0j0urn picture s0j0urn  Â·  10Comments