Joblib has recently released version 1.0.0. This results in issues with parallel processing with larger datasets.
With hdbscan==0.8.26, joblib==1.0.0, python3.6, here's an example that fails:
import hdbscan
import sklearn.datasets as data
blobs, _ = data.make_blobs(n_samples=20000, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)
clusterer = hdbscan.HDBSCAN(min_cluster_size=25, core_dist_n_jobs=2, algorithm='best').fit(blobs)
Here's the traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 919, in fit
self._min_spanning_tree) = hdbscan(X, **kwargs)
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 615, in hdbscan
core_dist_n_jobs, **kwargs)
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/memory.py", line 352, in __call__
return self.func(*args, **kwargs)
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/hdbscan/hdbscan_.py", line 278, in _hdbscan_boruvka_kdtree
n_jobs=core_dist_n_jobs, **kwargs)
File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.__init__
File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/home/kevin/anaconda3/envs/enview36/lib/python3.6/site-packages/joblib/parallel.py", line 831, in dispatch_one_batch
islice = list(itertools.islice(iterator, big_batch_size))
File "hdbscan/_hdbscan_boruvka.pyx", line 412, in genexpr
TypeError: delayed() got an unexpected keyword argument 'check_pickle'
The above test succeeds when joblib 0.17.0 is installed
joblib just pushed their first major release 1.0.0 2 days ago. This is very likely the cause. I'll try to look into this later today.
Yep. For reference, here's the joblib commit where check_pickle
was removed: https://github.com/joblib/joblib/commit/8c2cbd9f4b3f64037ff46ee1cb4b0680d77dfa02 that was included in the 1.0.0 release
You spelled out joblib pretty clearly in your first post and I feel like I didn't read any of that the first time, haha. This should be fixed in PR #438 . Tested locally with your example stub on python 3.8.5 and it's back to normal.
for now downgrade joblib to 0.17 fix the issue:
conda install joblib==0.17.0
Wondering is this fixed?
Most helpful comment
for now downgrade joblib to 0.17 fix the issue:
conda install joblib==0.17.0