Hdbscan: Import hdbscan ISSUE

Created on 29 Nov 2018  路  27Comments  路  Source: scikit-learn-contrib/hdbscan

Hi,
I have followed all the steps for installing hdbscan, but still I'm getting the error:

AttributeError                            Traceback (most recent call last)
<ipython-input-29-3f5a460d7435> in <module>
----> 1 import hdbscan

/opt/conda/lib/python3.6/site-packages/hdbscan/__init__.py in <module>
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index
      4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors
      5 

/opt/conda/lib/python3.6/site-packages/hdbscan/hdbscan_.py in <module>
     19 from scipy.sparse import csgraph
     20 
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,

hdbscan/_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

AttributeError: type object 'hdbscan._hdbscan_linkage.UnionFind' has no attribute '__reduce_cython__'

Does someone knows how to fix it ?
Bests,

Most helpful comment

To make hdbscan work on my system, I updated scipy, numpy, closed the notebook once, restarted it and then it started working.

All 27 comments

I have never seen anything quite like that before? Are you installing via conda and conda-forge?

Hi Imcinnes,
I fixed it by installing the package manually, thanks a lot for the speed of the answer.
Greg

Hi,
my team started seeing the same error today when using HDBSCAN:

_______ ERROR collecting test/clustering/test_hdbscan_point_clusterer.py _______
test/clustering/test_hdbscan_point_clusterer.py:7: in <module>
    from aggregation.clustering.hdbscan_point_clusterer import HDBSCANClusterer
aggregation/clustering/hdbscan_point_clusterer.py:1: in <module>
    import hdbscan
/usr/local/lib/python3.7/site-packages/hdbscan/__init__.py:1: in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
/usr/local/lib/python3.7/site-packages/hdbscan/hdbscan_.py:21: in <module>
    from ._hdbscan_linkage import (single_linkage,
hdbscan/_hdbscan_linkage.pyx:161: in init hdbscan._hdbscan_linkage
    ???
E   AttributeError: type object 'hdbscan._hdbscan_linkage.UnionFind' has no attribute '__reduce_cython__'

Right before this one we also started seeing - not sure yet if they are related:

__________________ ERROR collecting test/test_aggregation.py ___________________
test/test_aggregation.py:5: in <module>
    from aggregation.clustering import (dbscan_point_clusterer,
aggregation/clustering/hdbscan_point_clusterer.py:1: in <module>
    import hdbscan
/usr/local/lib/python3.7/site-packages/hdbscan/__init__.py:1: in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
/usr/local/lib/python3.7/site-packages/hdbscan/hdbscan_.py:21: in <module>
    from ._hdbscan_linkage import (single_linkage,
__init__.pxd:918: in init hdbscan._hdbscan_linkage
    ???
E   ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

We use Docker (base image python:3.7.1-slim) and hdbscan==0.8.18. The weird thing is that it worked without problems for a few months now and apparently started failing spontaneously as we haven't applied any changes to the environment recently.

Upgrade from numpy==1.15.3 to numpy==1.16.0 fixed the problem for us.

We have used the same set of dependencies for months now, so why it suddenly started to fail remains a mistery.

I have tried all the methods you said (manual installation, upgrade numpy), but it still doesn't work.

import hdbscan
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-47-3f5a460d7435> in <module>
----> 1 import hdbscan

~/anaconda3/lib/python3.7/site-packages/hdbscan/__init__.py in <module>
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index
      4 from .prediction import approximate_predict, membership_vector, all_points_membership_vectors
      5 

~/anaconda3/lib/python3.7/site-packages/hdbscan/hdbscan_.py in <module>
     19 from scipy.sparse import csgraph
     20 
---> 21 from ._hdbscan_linkage import (single_linkage,
     22                                mst_linkage_core,
     23                                mst_linkage_core_vector,

hdbscan/_hdbscan_linkage.pyx in init hdbscan._hdbscan_linkage()

AttributeError: type object 'hdbscan._hdbscan_linkage.UnionFind' has no attribute '__reduce_cython__'
nosetests -s hdbscan
============================================================================
ERROR: Failure: ModuleNotFoundError (No model named 'hdbscan._hdbscan_linkage')
----------------------------------------------------------------------------
Traceback (most recent call last):
...
...
ModuleNotFoundError: No model named 'hdbscan._hdbscan_linkage'

Seems that the solution to this problem is updating the version of numpy.

pip install --upgrade --user numpy

for pip users

I tried upgrading the version of numpy too, it still gives me the same error. Windows user, installed through pip. Can import hdbscan in cmd but not in jupyter notebook

If you can install through conda that may help with a lot of issues on windows.

Our team is getting an error as well. Our error seems like it may be related to issue #258.

Python 3.5.4 |Anaconda custom (64-bit)| (default, Sep 19 2017, 08:15:17) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import hdbscan
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Anaconda3\lib\site-packages\hdbscan\__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "D:\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py", line 21, in <module>
    from ._hdbscan_linkage import (single_linkage,
  File "__init__.pxd", line 918, in init hdbscan._hdbscan_linkage
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

To make hdbscan work on my system, I updated scipy, numpy, closed the notebook once, restarted it and then it started working.

@LogicPlum That looks like there are som incompatibilities with the numpy version you have installed. It is possible that the windows pip wheels are not working on your platform. If you can install the conda version that would definitely be better, otherwise your next bets alternative it to try installing directly from source to ensure it gets built against the numpy on your system.

Hi all,

I tried all of the above as well (upgrading scipy, numpy, cython), restarting everything, but still get the same error. I had no problems installing it, but upon importing I get the following error:

>>> import hdbscan Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/Hovanes/anaconda3/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module> from .hdbscan_ import HDBSCAN, hdbscan File "/Users/Hovanes/anaconda3/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 21, in <module> from ._hdbscan_linkage import (single_linkage, File "__init__.pxd", line 918, in init hdbscan._hdbscan_linkage ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

@hovikgas Any chance you can install from source (e.g directly from a clone of the repository here)? IF so, does that work okay, or are there still issues?

@lmcinnes I tried doing that, and ran into multiple issues. First, it kept giving a warning about using deprecated Numpy API:

warning: 
      "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API
      NPY_1_7_API_VERSION" [-W#warnings]

Then it kept giving errors about incompatible pointers and warnings about code never being executed (here is just a snippet of that):

hdbscan/dist_metrics.c:7052:13: warning: code will never be executed [-Wunreachable-code]
  __pyx_r = 0;
            ^
hdbscan/dist_metrics.c:7084:74: warning: incompatible pointer types passing 'struct
      __pyx_obj_7hdbscan_12dist_metrics_DistanceMetric *' to parameter of type 'struct
      __pyx_obj_7hdbscan_12dist_metrics_SEuclideanDistance *' [-Wincompatible-pointer-types]
  ...((struct __pyx_obj_7hdbscan_12dist_metrics_DistanceMetric *)__pyx_v_self), __pyx_v_x1, __pyx_v_x2, __...
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hdbscan/dist_metrics.c:6887:168: note: passing argument to parameter '__pyx_v_self' here
  ...__pyx_obj_7hdbscan_12dist_metrics_SEuclideanDistance *__pyx_v_self, __pyx_t_7hdbscan_12dist_metrics_D...
                                                           ^
hdbscan/dist_metrics.c:7845:73: warning: incompatible pointer types passing 'struct
      __pyx_obj_7hdbscan_12dist_metrics_DistanceMetric *' to parameter of type 'struct
      __pyx_obj_7hdbscan_12dist_metrics_MinkowskiDistance *' [-Wincompatible-pointer-types]
  ...((struct __pyx_obj_7hdbscan_12dist_metrics_DistanceMetric *)__pyx_v_self), __pyx_v_x1, __pyx_v_x2, __...
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hdbscan/dist_metrics.c:7764:166: note: passing argument to parameter '__pyx_v_self' here
  ...__pyx_obj_7hdbscan_12dist_metrics_MinkowskiDistance *__pyx_v_self, __pyx_t_7hdbscan_12dist_metrics_DT...
                                                          ^
hdbscan/dist_metrics.c:8493:13: warning: code will never be executed [-Wunreachable-code]
  __pyx_r = 0;

Ultimately it ended up generating 10 warnings, but continued to check the dependencies for scipy, Cython, scikit-learn, and numpy, finishing the installation for hdbscan==0.8.20.

Now, when I try to import hdbscan, I get the following error:

ModuleNotFoundError: No module named 'hdbscan._hdbscan_linkage'

Okay, something is very broken in all of this, but I am honestly not sure what. Let me look into it a little bit and then try to get back to you.

Okay, the warnings are all fine, and I believe your final problem is potentially due to the fact that python searches the current directory prior to going to look in site-packages. If you finished the install and then ran python from that directory it will find the hdbscan directory sitting there and fail. If you change to another directory before running python hopefully it will load the module successfully? That was certainly the case for me.

Ah, yes, then we just go back to my original error regarding the Numpy.ufunc size changing:

ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

Hmm. Unfortunately that is something I can't seem to reproduce. I'm not really much of an expert on numpy and the binary packaging thereof, so we are wandering well outside my expertise at this point. Sorry.

Had this issue several times in jupyter environment right after installation
Restarting the notebook (the one using hdbscan) solved the issue

I tried restarting several times, to no avail. However, I recently had issues with theano and PyMC3 as well, and, following the advice of various StackOverflow users, I was finally able to fix those issues (which were apparently MacOS Mojave-related). Fortunately, that seems to have fixed my HDBSCAN issues as well! Regardless, thanks for your help @lmcinnes!

I have the similar issue.
image

When I try to do pip install hdbscan in terminal, I have this error:
image

@shangwen777 try installing it through conda-forge:

conda install -c conda-forge hdbscan

Hey, has anyone faced issue because of mismatch in the library name? The import hdbscan statement is trying to load a function from _sklearn.externals.joblib.parallel_, whereas the actual library name is Parallel, with a capital P. Not sure if this is the reason. TIA!
Screenshot 2019-06-17 at 5 41 21 PM

@hovikgas

After installing with

conda install -c conda-forge hdbscan

I get the error ModuleNotFoundError: No module named 'hdbscan._hdbscan_linkage' which seems to be an on-and-off error based on previous issues. I've tried updating numpy and scipy. I'm convinced it has nothing to do with the directory I'm in. I've also checked that the .so dependency files are not empty.

I would greatly appreciate any advice! Thank you!

@miretchin Yeah, check the version compatibility. Your numpy and scipy might be TOO new, so you might have to downgrade them to the latest compatible version.

Hi,

I had the same error and it was fixed when I updated my pip:

pip install --upgrade pip

Then tried to install it again:

pip install hdbscan

and restarted the notebook.

Hi guys,

I tried many of the steps suggested here and at other places but to no avail. org.
However, I realized that hdbscan is working with some specific versions of packages and is failing otherwise. Following is the list of versions that made it finally import for me:

hdbscan=0.8.19=py37h3010b51_0
matplotlib=3.2.2=0
numpy=1.15.4=py37h7e9f1db_0
pandas=0.23.4=py37h04863e7_0
scikit-learn=0.20.1=py37hd81dba3_0
scipy=1.1.0=py37h7c811a0_2
tensorflow=1.13.1=mkl_py37h54b294f_0

I have also attached my conda environment file here.
hdbscan.txt

Was this page helpful?
0 / 5 - 0 ratings