Faiss: For Kmeans clustering in Python, how can I initialize centroids with computed array instead of random initialization ?

Created on 16 Jul 2020  路  10Comments  路  Source: facebookresearch/faiss

Summary

When doing kmeans in python, I am trying to set pre-computed centroids before training instead of random initialization, I wonder how can I do that?

Platform

OS: ubuntu18.04

Running on: GPU

Interface: Python

documentation

Most helpful comment

See here:
https://github.com/facebookresearch/faiss/wiki/Python-C---code-snippets#how-can-i-force-the-k-means-initialization

All 10 comments

+1

I believe setting centroids with your desired starting centroids on the Clustering object before performing k-means clustering will start with that centroid set.

https://github.com/facebookresearch/faiss/blob/master/Clustering.h#L73

How can I do so for IndexIVF.train()?

How can I do so for IndexIVF.train()?

Might be difficult for now; add a Clustering object to IndexIVF::Level1Quantizer would help.

All IndexIVF classes have a cp member which is the ClusteringParameters used for the IVF clustering.

https://github.com/facebookresearch/faiss/blob/master/IndexIVF.h#L45

Right. This should be documented better.

See here:
https://github.com/facebookresearch/faiss/wiki/Python-C---code-snippets#how-can-i-force-the-k-means-initialization

What version of faiss is the 'init_centroids' parameter available? I am using faiss-gpu 1.6.3 and when I run the notebook, I get.

got an unexpected keyword argument 'init_centroids'

Ah right, it may be available only in the latest version of Faiss.
As a quick workaround, you can just patch your faiss.py with the code here:
https://github.com/facebookresearch/faiss/blob/750d43f/faiss/python/__init__.py#L693

Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0DF0Arc picture 0DF0Arc  路  3Comments

minjiaz picture minjiaz  路  3Comments

jukaradayi picture jukaradayi  路  3Comments

linghuang picture linghuang  路  3Comments

daniellevy picture daniellevy  路  3Comments