probs = hdbscan.prediction.membership_vector(clusterer, X_train)
ValueError: Buffer dtype mismatch, expected 'float64_t' but got 'long'
I tried this on latest master
with the following metrics: manhattan
, hamming
, euclidean
.
Ah, it looks like this happens if my training X data is of type np.int64
. This was hard to debug. I recommend adding type assertions somewhere.
Actually the membership_vector
function tends to return lots of nan
s and values like 2.77568154e-157
, which I don't know how to interpret. I see that the tests for it are commented out, so I'll let this go :-) Maybe the best solution is to deprecate this function?
+1 to this, thanks for figuring out where the float64_t thing came from
Things work, but it was never very well tested for diverse datasets and there are certainly some issues. I'll have to see if I can find some time to sort through this and get it fixed.
I mean, I think it's also partly numpy's fault. I had an error message saying Buffer dtype mismatch, expected 'float64_t' but got 'float64'
. A bit too cryptic to figure out.
That's definitely just a typo somewhere on my part I believe. I should be able fix that, but as for the rest ... that may take a little more time.
Happy to help if you can define some concrete TODOs.
Thank you! I'll see if I can work up a short list of things and put it here. Any and all help is greatly appreciated!
Is this a good place to ask about why this isn't in sklearn itself?
I have talked to the sklearn maintainers and the short answer is that the algorithm (HDBSCAN*) is considered too new (i.e. has insufficient citations in the literature) for inclusion at this time. As the algorithm gains in popularity and citations inclusion in sklearn proper will hopefully happen.
Likewise keen to help out with this, as I'd like to see this adopted much more widely.
That makes sense; an algorithm being in sklearn implies that the technique has credibility.
Most helpful comment
Thank you! I'll see if I can work up a short list of things and put it here. Any and all help is greatly appreciated!