Hi there,
I have been running IndexIVFPQ, and I have observed that when nprobe (the number of inverted lists to probe) is equal to nlist (the number of inverted lists created by coarse quantizer), the recall is sometimes not 1.
My understanding is that when nprobe equals to nlist, FAISS would probe every single inverted list and do an exhaustive search in each list. Since this search process would compare the query with every data point in the index, which essentially becomes an exhaustive search, I expect the recall should be 1.
How come recall is not 1 when nprobe equals to nList ?
I've observed this effect on SIFT10K, SIFT1M datasets, as well as my own dataset. One quick way to reproduce the result is to use SIFT10K data with the parameter dim = 128, nList = 100, m = 16, nbits = 8, nprobe = 100. In my experiment, R@1 is 0.72 and R@10 is 0.97.
With IndexIVFPQ, this is totally expected: this index makes has two sources approximations:
If you want exact search, use IndexFlat or IndexGPUFlat. If you are targeting a high-accuracy regime (but not 100%), you can use IndexIVFFlat or IMI (or GPU counterparts).
If you want to keep compressed-domain index,
Hi Herv茅,
Thanks for your reply! That makes sense. The distance evaluation in PQ happens between a query and product quantization descriptor of a data point, not in between a query and the exact data point. It can happen that though point x has a closer distance to a query q than point y, point x's pq code is farther away to query q than point y's pq code. Intuitively, it seems there is no guarantee of reaching 100% recall when distance is calculated using approximated distances.
Yes, exactly.
Most helpful comment
With IndexIVFPQ, this is totally expected: this index makes has two sources approximations:
If you want exact search, use IndexFlat or IndexGPUFlat. If you are targeting a high-accuracy regime (but not 100%), you can use IndexIVFFlat or IMI (or GPU counterparts).
If you want to keep compressed-domain index,
https://github.com/facebookresearch/faiss/blob/master/benchs/bench_scalar_quantizer.py
https://github.com/facebookresearch/faiss/blob/master/IndexScalarQuantizer.h