Running on:
Interface:
How can I give threshold value to cosine similarity search instead of giving knn?
First, you need to use an index that supports _Inner Product_ as metric, for example :
index = faiss.IndexFlatIP(d)
index.add(xb)
Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following
# https://github.com/facebookresearch/faiss/blob/master/python/faiss.py#L673
# NOTE : it happens "in place"
faiss.normalize_L2(x=xb)
faiss.normalize_L2(x=xq)
Finally, you can use range_search instead of search method.
It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search
threshold = 0.95
lims, D, I = index.range_search(x=xq, thresh=threshold)
NOTE : The results are not going to be sorted by _cosine similarity_.
Also, I guess range_search may be more _memory efficient_ than search, but I'm not sure.
no activity, closing.
First, you need to use an index that supports _Inner Product_ as metric, for example :
index = faiss.IndexFlatIP(d) index.add(xb)Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following
# https://github.com/facebookresearch/faiss/blob/master/python/faiss.py#L673 # NOTE : it happens "in place" faiss.normalize_L2(x=xb) faiss.normalize_L2(x=xq)Finally, you can use
range_searchinstead ofsearchmethod.It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search
threshold = 0.95 lims, D, I = index.range_search(x=xq, thresh=threshold)NOTE : The results are not going to be sorted by _cosine similarity_.
Also, I guess
range_searchmay be more _memory efficient_ thansearch, but I'm not sure.
Thank you very much for your answer, I would however like to bring a slight precision that I personally had a problem with. At.
Before adding your vectors to the IndexFlatIP, you must faiss.normalize_L2(x=xb) your vectors inplace prior. Otherwise your range_searchwill be done on the un-normalized vectors, providing wrong results.
Most helpful comment
First, you need to use an index that supports _Inner Product_ as metric, for example :
Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following
Finally, you can use
range_searchinstead ofsearchmethod.It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search
NOTE : The results are not going to be sorted by _cosine similarity_.
Also, I guess
range_searchmay be more _memory efficient_ thansearch, but I'm not sure.