Faiss: Question: GpuMultipleClonerOptions options meaning

Created on 4 Aug 2018 · 5Comments · Source: facebookresearch/faiss

Hi,
Could you please clarify what is the meaning of the following members?

GpuMultipleClonerOptions.usePrecomputed
GpuMultipleClonerOptions.shard
GpuMultipleClonerOptions.reserveVecs

Docs seem to be not very helpful.

question

Source

asanakoy

Most helpful comment

For usePrecomputed, this appears to be specific to the IVFPQ index. There is some better documentation and the calculations of the tables here:
https://github.com/facebookresearch/faiss/blob/master/IndexIVFPQ.cpp#L320
Essentially, there are some terms of the distance calculation which do not involve the query vector. Since their value doesn't depend on the query vector, this means that we can compute these values once and store them.

For shard, this is related to the IVF. I describe how IVF works in a tutorial I've written here. It clusters the dataset to divide it into chunks. Then at query time you compare your query to the cluster centroids, and only search through the clusters closest to the query. Since clustering divides the dataset into chunks, these chunks can be spread across the memories of multiple GPUs. This is "sharding" the dataset. This feature only works for IVF indexes.

chrisjmccormick on 22 Aug 2018

👍2

All 5 comments

see https://github.com/facebookresearch/faiss/blob/master/gpu/GpuClonerOptions.h

mdouze on 16 Aug 2018

Thank you. But still not clear, since the comments are very brief.

asanakoy on 20 Aug 2018

For usePrecomputed, this appears to be specific to the IVFPQ index. There is some better documentation and the calculations of the tables here:
https://github.com/facebookresearch/faiss/blob/master/IndexIVFPQ.cpp#L320
Essentially, there are some terms of the distance calculation which do not involve the query vector. Since their value doesn't depend on the query vector, this means that we can compute these values once and store them.

For shard, this is related to the IVF. I describe how IVF works in a tutorial I've written here. It clusters the dataset to divide it into chunks. Then at query time you compare your query to the cluster centroids, and only search through the clusters closest to the query. Since clustering divides the dataset into chunks, these chunks can be spread across the memories of multiple GPUs. This is "sharding" the dataset. This feature only works for IVF indexes.

chrisjmccormick on 22 Aug 2018

👍2

The comments may be a bit short. Feel free to suggest more detailed descriptions.

mdouze on 27 Aug 2018

No activity, closing.

mdouze on 28 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Segmentation fault or process interrupted by signal 11: SIGSEGV. Python wrapper.

ilyakhov · 3Comments

Recall of IndexIVFPQ when nprobe is the same as nlist

minjiaz · 3Comments

How can we build index/search based on cosine similarity

cherryPotter · 3Comments

Vector normalization

daniellevy · 3Comments

How to search the nearest with hamming distance rather than L2 distance?

Tony-Hou · 3Comments