Faiss: Threshold value to cosine similarity search

Created on 30 Jun 2020  Â·  4Comments  Â·  Source: facebookresearch/faiss

Running on:

  • [ ] CPU

Interface:

  • [ ] Python

How can I give threshold value to cosine similarity search instead of giving knn?

question

Most helpful comment

First, you need to use an index that supports _Inner Product_ as metric, for example :

index = faiss.IndexFlatIP(d)
index.add(xb)

Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following

# https://github.com/facebookresearch/faiss/blob/master/python/faiss.py#L673
# NOTE : it happens "in place"
faiss.normalize_L2(x=xb)
faiss.normalize_L2(x=xq)

Finally, you can use range_search instead of search method.

It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search

threshold = 0.95
lims, D, I = index.range_search(x=xq, thresh=threshold)

NOTE : The results are not going to be sorted by _cosine similarity_.

Also, I guess range_search may be more _memory efficient_ than search, but I'm not sure.

All 4 comments

First, you need to use an index that supports _Inner Product_ as metric, for example :

index = faiss.IndexFlatIP(d)
index.add(xb)

Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following

# https://github.com/facebookresearch/faiss/blob/master/python/faiss.py#L673
# NOTE : it happens "in place"
faiss.normalize_L2(x=xb)
faiss.normalize_L2(x=xq)

Finally, you can use range_search instead of search method.

It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search

threshold = 0.95
lims, D, I = index.range_search(x=xq, thresh=threshold)

NOTE : The results are not going to be sorted by _cosine similarity_.

Also, I guess range_search may be more _memory efficient_ than search, but I'm not sure.

no activity, closing.

First, you need to use an index that supports _Inner Product_ as metric, for example :

index = faiss.IndexFlatIP(d)
index.add(xb)

Then, you should probably _normalize all_ embeddings first ( the inner product between two normalized embeddings corresponds to their cosine similarity ). It can be done using the following

# https://github.com/facebookresearch/faiss/blob/master/python/faiss.py#L673
# NOTE : it happens "in place"
faiss.normalize_L2(x=xb)
faiss.normalize_L2(x=xq)

Finally, you can use range_search instead of search method.

It is described here : https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search

threshold = 0.95
lims, D, I = index.range_search(x=xq, thresh=threshold)

NOTE : The results are not going to be sorted by _cosine similarity_.

Also, I guess range_search may be more _memory efficient_ than search, but I'm not sure.
Thank you very much for your answer, I would however like to bring a slight precision that I personally had a problem with. At.

Before adding your vectors to the IndexFlatIP, you must faiss.normalize_L2(x=xb) your vectors inplace prior. Otherwise your range_searchwill be done on the un-normalized vectors, providing wrong results.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hashyong picture hashyong  Â·  3Comments

linghuang picture linghuang  Â·  3Comments

jukaradayi picture jukaradayi  Â·  3Comments

hipitt picture hipitt  Â·  3Comments

minjiaz picture minjiaz  Â·  3Comments