Faiss: Question: GpuMultipleClonerOptions options meaning

Created on 4 Aug 2018  路  5Comments  路  Source: facebookresearch/faiss

Hi,
Could you please clarify what is the meaning of the following members?

  1. GpuMultipleClonerOptions.usePrecomputed
  2. GpuMultipleClonerOptions.shard
  3. GpuMultipleClonerOptions.reserveVecs

Docs seem to be not very helpful.

question

Most helpful comment

For usePrecomputed, this appears to be specific to the IVFPQ index. There is some better documentation and the calculations of the tables here:
https://github.com/facebookresearch/faiss/blob/master/IndexIVFPQ.cpp#L320
Essentially, there are some terms of the distance calculation which do not involve the query vector. Since their value doesn't depend on the query vector, this means that we can compute these values once and store them.

For shard, this is related to the IVF. I describe how IVF works in a tutorial I've written here. It clusters the dataset to divide it into chunks. Then at query time you compare your query to the cluster centroids, and only search through the clusters closest to the query. Since clustering divides the dataset into chunks, these chunks can be spread across the memories of multiple GPUs. This is "sharding" the dataset. This feature only works for IVF indexes.

All 5 comments

Thank you. But still not clear, since the comments are very brief.

For usePrecomputed, this appears to be specific to the IVFPQ index. There is some better documentation and the calculations of the tables here:
https://github.com/facebookresearch/faiss/blob/master/IndexIVFPQ.cpp#L320
Essentially, there are some terms of the distance calculation which do not involve the query vector. Since their value doesn't depend on the query vector, this means that we can compute these values once and store them.

For shard, this is related to the IVF. I describe how IVF works in a tutorial I've written here. It clusters the dataset to divide it into chunks. Then at query time you compare your query to the cluster centroids, and only search through the clusters closest to the query. Since clustering divides the dataset into chunks, these chunks can be spread across the memories of multiple GPUs. This is "sharding" the dataset. This feature only works for IVF indexes.

The comments may be a bit short. Feel free to suggest more detailed descriptions.

No activity, closing.

Was this page helpful?
0 / 5 - 0 ratings