hi, I only see two choices for searching: METRIC_INNER_PRODUCT, METRIC_L2. how can I search with cosine similarity?
Hi
Please L2-normalize the vectors before adding and searching, then search with METRIC_INNER_PRODUCT.
@mdouze ,I have the same question. But how to search with METRIC_INNER_PRODUCT? can you give example and code? thanks.
@yhpku something like that:
from faiss import normalize_L2
# ...
index.train(normalize_L2(training_vectors))
index.add(normalize_L2(index_vectors))
index.search(normalize_L2(search_vectors), 5)
The metric inner product flag is set when the index is built.
@billkle1n
It seems that faiss.normalize_L2() doesn't have a return value. It normalizes the matrix in place. So instead of
index.train(normalize_L2(training_vectors)),
it should be
normalize_L2(training_vectors)
index.train(training_vectors)
I have a question, when i try normalize_L2(dest_array_one) , i get the error:
File "
File "/root/anaconda3/envs/faiss/lib/python2.7/site-packages/faiss/__init__.py", line 523, in normalize_L2
fvec_renorm_L2(x.shape[1], x.shape[0], swig_ptr(x))
TypeError: in method 'fvec_renorm_L2', argument 3 of type 'float *'
@13293824182 make sure your array is of type float32
Just adding example if noob like me came here to find how to calculate the Cosine similarity from scratch
import faiss
dataSetI = [.1, .2, .3]
dataSetII = [.4, .5, .6]
x = np.array([dataSetI]).astype(np.float32)
q = np.array([dataSetII]).astype(np.float32)
index = faiss.index_factory(3, "Flat", faiss.METRIC_INNER_PRODUCT)
index.ntotal
faiss.normalize_L2(x)
index.add(x)
faiss.normalize_L2(q)
distance, index = index.search(q, 5)
print('Distance by FAISS:{}'.format(distance))
from scipy import spatial
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)
print('Distance by FAISS:{}'.format(result))
Most helpful comment
@billkle1n
It seems that faiss.normalize_L2() doesn't have a return value. It normalizes the matrix in place. So instead of
index.train(normalize_L2(training_vectors)),it should be
normalize_L2(training_vectors)index.train(training_vectors)