Hi - I'm trying to use the weighted minkowski metric:
clusterer = hdbscan.HDBSCAN(algorithm='best', alpha=1.0, approx_min_span_tree=True,
gen_min_span_tree=False, leaf_size=40,
metric='wminkowski', min_cluster_size=20, min_samples=5,
core_dist_n_jobs=1, p=2, w=[2, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.1, 0.1])
hdb_clusters = clusterer.fit(data)
Here is the error I get - it looks like the p and w arguments are not passing through all the way:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-174-0c6dd011773f> in <module>()
----> 1 hdb_clusters = clusterer.fit(data)
/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in fit(self, X, y)
862 self._condensed_tree,
863 self._single_linkage_tree,
--> 864 self._min_spanning_tree) = hdbscan(X, **kwargs)
865
866 if self.prediction_data:
/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
604 approx_min_span_tree,
605 gen_min_span_tree,
--> 606 core_dist_n_jobs, **kwargs)
607
608 return _tree_to_labels(X,
/usr/local/anaconda/lib/python3.5/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs)
281
282 def __call__(self, *args, **kwargs):
--> 283 return self.func(*args, **kwargs)
284
285 def call_and_shelve(self, *args, **kwargs):
/usr/local/anaconda/lib/python3.5/site-packages/hdbscan/hdbscan_.py in _hdbscan_boruvka_balltree(X, min_samples, alpha, metric, p, leaf_size, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, **kwargs)
313 X = X.astype(np.float64)
314
--> 315 tree = BallTree(X, metric=metric, leaf_size=leaf_size, **kwargs)
316 alg = BallTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
317 leaf_size=leaf_size // 3,
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:9223)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4824)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.WMinkowskiDistance.__init__ (sklearn/neighbors/dist_metrics.c:7811)()
TypeError: __init__() takes exactly 2 positional arguments (0 given)
That is a little disconcerting. There had previously been some issues with arguments to metric not getting passed through, but those got corrected. I'll have to look into exactly what is going wrong here.
I have just come across this also. The problem is with using weighted minkowski distance with the BallTree algorithm for nearest neighbour searching.
tree = BallTree(X, leaf_size=300, metric='wminkowski', w=w)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-bc9902aad526> in <module>()
1 X = np.array(df10.values)
2 p=10
----> 3 tree = BallTree(X, leaf_size=300, metric='wminkowski', w=w)#, #metric_params={'w': w, 'p' : p})
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:9223)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4824)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.WMinkowskiDistance.__init__ (sklearn/neighbors/dist_metrics.c:7811)()
TypeError: __init__() takes exactly 2 positional arguments (0 given)
Thanks for the catch -- this looks like an upstream issue in sklearn then. I'll have to push it up to them.
I get a similar when using NearestNeighbors:
nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="wminkowski", p=1, w=weights).fit(train_set)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-19acd55f47d9> in <module>()
8
9 # find the k neighbors
---> 10 nbrs = NearestNeighbors(n_neighbors=k+1, metric="wminkowski", p=1, w=weights).fit(train_set)
11 knn_distances, knn_idx = nbrs.kneighbors(train_set)
~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/unsupervised.py in __init__(self, n_neighbors, radius, algorithm, leaf_size, metric, p, metric_params, n_jobs, **kwargs)
121 algorithm=algorithm,
122 leaf_size=leaf_size, metric=metric, p=p,
--> 123 metric_params=metric_params, n_jobs=n_jobs, **kwargs)
TypeError: _init_params() got an unexpected keyword argument 'w'
I believe sklearn has refactored all of this and now has an extra argument that is a dictionary of metric_params. This is definitely a better way to do this, so I will eventually have to do the required refactoring to match. A PR would be happily accepted if anyone else would like to do the work -- it's fairly straightforward.
So here is what I did instead:
params = {'w': [1,1,1,1,1]}
nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="minkowski", p=1, metric_params = params).fit(train_set)
Yet the error persists:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-75873cd0d4a1> in <module>()
8
9 # find the k neighbors
---> 10 nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree', metric="minkowski", p=1, metric_params = params).fit(train_set)
11 knn_distances, knn_idx = nbrs.kneighbors(train_set)
~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
801 or [n_samples, n_samples] if metric='precomputed'.
802 """
--> 803 return self._fit(X)
~/opt/anaconda/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in _fit(self, X)
242 self._tree = BallTree(X, self.leaf_size,
243 metric=self.effective_metric_,
--> 244 **self.effective_metric_params_)
245 elif self._fit_method == 'kd_tree':
246 self._tree = KDTree(X, self.leaf_size,
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric()
TypeError: __init__() got an unexpected keyword argument 'w'
Looking at the documentation however, it seems that wminkowski might no longer be supported:
Valid values for metric are:
Any news on this?
There's note too much I can do about unsupported metrics upstream unfortunately -- at least not without ending up with more maintenance work long term than I am comfortable taking on right now. Sorry. :-(
Fair enough, cheers!
Good news, the answer is here https://stackoverflow.com/questions/48288047/scikit-learn-nearest-neighbor-search-with-weighted-distance-metric