Faiss: IndexIVFFlat on 2M embeddings from FaceNet is giving poor results

Created on 23 Oct 2019  路  2Comments  路  Source: facebookresearch/faiss

Summary

I am using embeddings computed from the popular FaceNet model. I have calculate about 2.5M embeddings in d=512 and am looking at performance of the IndexIVFFlat compared to the simple Flat index. Even with large k I see flat results in the recall

Running on:

  • [ ] GPU

Interface:

  • [ ] Python

Reproduction instructions

xb = np.ascontiguousarray(X[::2][:2*1000*1000])
xq = np.ascontiguousarray(X[1::2][:10*1000])
d = xq.shape[1]

# compute gt
flat_index = faiss.index_factory(d, "Flat")
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, flat_index, None)
flat_index.train(xb)
flat_index.add(xb)
D, gt = flat_index.search(xq, k)

# try an approximate method
index = faiss.index_factory(d, "IVF<n_centroids>,Flat")
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, index, None)
index.train(xb)
index.add(xb)

def evaluate(index, xq, gt, k):
    nq = xq.shape[0]
    t0 = time.time()
    D, I = index.search(xq, k)  # noqa: E741
    t1 = time.time()
    recalls = {}
    i = 1
    while i <= k:
        recalls[i] = (I[:, :i] == gt[:, :1]).sum() / float(nq)
        i *= 10

    return (t1 - t0) * 1000.0 / nq, recalls

evaluate(flat_index, xq, gt, 1000)
>>
(2.1849388122558593, 
 {1: 0.99850000000000005, 
  10: 1.0, 
  100: 1.0, 
  1000: 1.0})

evaluate(index, xq, gt, 1000)

>>
(0.038869810104370114,
 {1: 0.35210000000000002,
  10: 0.35289999999999999,
  100: 0.35289999999999999,
  1000: 0.35299999999999998})

Notice how the recall is not increasing as k increases.

I have tried many ,, between 4096 to 20000 and I do not see any improvement.

Questions:

  1. Is it possible that the data distribution is not conducive to this method?

  2. Am I possibly splitting my query and training set incorrectly?

help wanted

Most helpful comment

You are only looking in a single IVF list, as nprobe is by default 1.

Increase nprobe rather than k.

All 2 comments

You are only looking in a single IVF list, as nprobe is by default 1.

Increase nprobe rather than k.

of course, merci beaucoup!

I did want to ask about the typical strategy to split your datasets. In some examples I have noticed that you build an xb, xt, xq dataset: one for training, one for adding and the last for query (equivalent to a test set). I am not sure what is the typical split for this field. Do you usually train on xt, add [xt, xb] (or does xb already contain xt?) to the index, and search with xt? It is hard to tell how you have constructed your memmap files. What proportion of the whole dataset is xq, xt and xb typically?

thanks for such a killer project!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zjjott picture zjjott  路  3Comments

hipitt picture hipitt  路  3Comments

cherryPotter picture cherryPotter  路  3Comments

minjiaz picture minjiaz  路  3Comments

lukedeo picture lukedeo  路  3Comments