Producing a simple dataframe via
x = np.linspace(0,100,200)
y = np.arange(0,200)
xy, _ = np.meshgrid(x,y)
noise = 0.3*np.random.random((200,200))
series = np.sin(xy+5*noise) + noise
series [0,:] += 10*np.random.random(200)
data = pd.DataFrame(series)
I try to run HDBSCAN clustering with the default arguments
clusterer = hdbscan.HDBSCAN().fit(data)
And get the following error
TypeError Traceback (most recent call last)
<ipython-input-58-31e012db38f8> in <module>()
1 from sklearn.cluster import DBSCAN
----> 2 clusterer = hdbscan.HDBSCAN().fit(data)
~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in fit(self, X, y)
814 self._condensed_tree,
815 self._single_linkage_tree,
--> 816 self._min_spanning_tree) = hdbscan(X, **kwargs)
817
818 if self.prediction_data:
~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
534 _hdbscan_prims_kdtree)(X, min_samples, alpha,
535 metric, p, leaf_size,
--> 536 gen_min_span_tree, **kwargs)
537 else:
538 (single_linkage_tree, result_min_span_tree) = memory.cache(
~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs)
360
361 def __call__(self, *args, **kwargs):
--> 362 return self.func(*args, **kwargs)
363
364 def call_and_shelve(self, *args, **kwargs):
~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in _hdbscan_prims_kdtree(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
168
169 # TO DO: Deal with p for minkowski appropriately
--> 170 dist_metric = DistanceMetric.get_metric(metric, **kwargs)
171
172 # Get distance to kth nearest neighbour
TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'
I tried explicitly specifying other metrics with metric = 'manhattan' etc argument, did not help
I suspect this is an arg order issue in the code somewhere, possibly due to additions. This is a little disconcerting. Let me see if I can track this down later today.
Sorry, I ran out of time today. I'll have to try and get to this a little later. My apologies for the delay.
Also getting "TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'', even when just using the simple case in the documentation.
Error occurs with RobustSingleLinkage as well.
When trying to avoid the get_metric method receiving the string 'euclidean' or 'manhattan' etc. instead of the expected object, I used a precomputed distance matrix. Now getting:
NameError Traceback (most recent call last)
----> 1 clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(D)
C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in fit(self, X, y)
814 self._condensed_tree,
815 self._single_linkage_tree,
--> 816 self._min_spanning_tree) = hdbscan(X, **kwargs)
817
818 if self.prediction_data:
C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, *kwargs)
526 _hdbscan_generic)(X, min_samples,
527 alpha, metric, p, leaf_size,
--> 528 gen_min_span_tree, *kwargs)
529 elif metric in KDTree.valid_metrics:
530 # TO DO: Need heuristic to decide when to go to boruvka;
C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\memory.py in __call__(self, args, *kwargs)
281 return _load_output(self._output_dir, _get_func_fullname(self.func),
282 timestamp=self.timestamp,
--> 283 metadata=self.metadata, mmap_mode=self.mmap_mode,
284 verbose=self.verbose)
285
C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in _hdbscan_generic(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
85 min_samples, alpha)
86
---> 87 min_spanning_tree = mst_linkage_core(mutual_reachability_)
88
89 # mst_linkage_core does not generate a full minimal spanning tree
hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan_hdbscan_linkage.c:2894)()
hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan_hdbscan_linkage.c:2281)()
NameError: name 'np' is not defined
`
Sorry, I'm having trouble reproducing this. Can you tell me a little more about your setup?
I also checked - I have the same error using a precomputed distance matrix as farfan92.
Ubuntu 17.10, Anaconda 5.0.1, Python 3.6. The versions of packages installed in my used venv are:
packages in environment at /home/vladimir/anaconda3/envs/py36:
#
Updating packages seems to have removed the NameError. (numpy and sklearn specifically).
Must have been a compatibility issue, after installing some other packages.
I'm glad at least one of you got this resolved. Hopefully refreshing/updating packages might work twice? I am honestly at a little bit of a loss here.
Getting the exact same error message here (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str') despite updating the packages.
Got the same error message on an Ubuntu virtual machine with python 2.7 and a windows PC with python 3.6.4, both running the latest version of anaconda and having installed HDBscan through conda-forge. I may try to install it another way tomorrow
Alright, I actually had some time so I tested that. On the same machine, the pip install hdbscan worked immediately (after I removed the conda-forge version). Hope it helps you narrow it down and/or to fix it for others
I also had this error, but it was only present in the conda-forge
installed version of hdbscan
. pip install
version of hdbscan
. I removed the conda-forge
version, ran pip install hdbsan
for my conda environment, and hdbscan
works find.
@linwoodc3 That's a little weird; the conda-forge version gets synced with the pip version regularly. Perhaps a conda upgrade umap-learn
would have doen the job? Regardless, you have a working version now, and that's what counts. Thanks for the report, I'll keep an eye out for something amiss like this somewhere along the line.
I just got this same error (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'). I installed hdbscan just yesterday via pip. I did notice that when I tried to import it, it gave me an error about 'numpy.core.multiarray failed to import' but no reason why. So I imported numpy.core.multiarray manually, and then I was able to import hdbscan. Don't know whether that is a related problem. But attempting to fit some data that I had just fit with sklearn.cluster.DBSCAN failed with the above error when I tried to do it with hdbscan. I have python 2.7.13 and numpy 1.11.2. 'pip check' doesn't find any broken dependencies. What else can I try? I would really like to use hdbscan, as I have data whose clusters are certain to have variable density. Does hdbscan require python 3.x perhaps, along with all of the dependent versions of numpy, Cython, etc.?