When doing kmeans in python, I am trying to set pre-computed centroids before training instead of random initialization, I wonder how can I do that?
OS: ubuntu18.04
Running on: GPU
Interface: Python
+1
I believe setting centroids with your desired starting centroids on the Clustering object before performing k-means clustering will start with that centroid set.
https://github.com/facebookresearch/faiss/blob/master/Clustering.h#L73
How can I do so for IndexIVF.train()?
How can I do so for
IndexIVF.train()?
Might be difficult for now; add a Clustering object to IndexIVF::Level1Quantizer would help.
All IndexIVF classes have a cp member which is the ClusteringParameters used for the IVF clustering.
https://github.com/facebookresearch/faiss/blob/master/IndexIVF.h#L45
Right. This should be documented better.
See here:
https://github.com/facebookresearch/faiss/wiki/Python-C---code-snippets#how-can-i-force-the-k-means-initialization
What version of faiss is the 'init_centroids' parameter available? I am using faiss-gpu 1.6.3 and when I run the notebook, I get.
got an unexpected keyword argument 'init_centroids'
Ah right, it may be available only in the latest version of Faiss.
As a quick workaround, you can just patch your faiss.py with the code here:
https://github.com/facebookresearch/faiss/blob/750d43f/faiss/python/__init__.py#L693
Thanks.
Most helpful comment
See here:
https://github.com/facebookresearch/faiss/wiki/Python-C---code-snippets#how-can-i-force-the-k-means-initialization