Hdbscan: TypeError: delayed() got an unexpected keyword argument 'check_pickle'

Created on 15 Dec 2020  路  5Comments  路  Source: scikit-learn-contrib/hdbscan

Joblib has recently released version 1.0.0. This results in issues with parallel processing with larger datasets.

With hdbscan==0.8.26, joblib==1.0.0, python3.6, here's an example that fails:

import hdbscan
import sklearn.datasets as data

blobs, _ = data.make_blobs(n_samples=20000, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)
clusterer = hdbscan.HDBSCAN(min_cluster_size=25, core_dist_n_jobs=2, algorithm='best').fit(blobs)

Here's the traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 919, in fit
    self._min_spanning_tree) = hdbscan(X, **kwargs)
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 615, in hdbscan
    core_dist_n_jobs, **kwargs)
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/memory.py", line 352, in __call__
    return self.func(*args, **kwargs)
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 278, in _hdbscan_boruvka_kdtree
    n_jobs=core_dist_n_jobs, **kwargs)
  File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.__init__
  File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/parallel.py", line 831, in dispatch_one_batch
    islice = list(itertools.islice(iterator, big_batch_size))
  File "hdbscan/_hdbscan_boruvka.pyx", line 412, in genexpr
TypeError: delayed() got an unexpected keyword argument 'check_pickle'

The above test succeeds when joblib 0.17.0 is installed

Most helpful comment

for now downgrade joblib to 0.17 fix the issue:
conda install joblib==0.17.0

All 5 comments

joblib just pushed their first major release 1.0.0 2 days ago. This is very likely the cause. I'll try to look into this later today.

Yep. For reference, here's the joblib commit where check_pickle was removed: https://github.com/joblib/joblib/commit/8c2cbd9f4b3f64037ff46ee1cb4b0680d77dfa02 that was included in the 1.0.0 release

You spelled out joblib pretty clearly in your first post and I feel like I didn't read any of that the first time, haha. This should be fixed in PR #438 . Tested locally with your example stub on python 3.8.5 and it's back to normal.

for now downgrade joblib to 0.17 fix the issue:
conda install joblib==0.17.0

Wondering is this fixed?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alonsopg picture alonsopg  路  10Comments

learningbymodeling picture learningbymodeling  路  8Comments

uellue picture uellue  路  7Comments

mickohara23 picture mickohara23  路  10Comments

Phlya picture Phlya  路  15Comments