Hi,
there are 100 million 128-d vectors in my database, trained and added,
and I'm going to add around half a million vectors every 4 hours.
Can I search the index while at the same time adding new vectors.
Thanks for your help!
Hi
No you can't, see
https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls
Your options are, from simplest to most complex:
lock the index for search during add.
perform the add in a separate temp index (that is trained in the same way as the main index) then merge the main index with the temp index (see https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#splitting-and-merging-indexes). The index will still be unavailable for search during the merge but downtime will be shorter.
at search time, during the 4 hours, copy the index to an offline index, add vectors to that one, and swap indexes every 4 hours. No downtime but the index is stored twice.
Appreciate for your valuable options锛丂mdouze
As far as I understand, faiss keeps the vectors index in the same order as they were added.
For eg: vector_a was the 1000th added, then he owns index No.1000 in faiss index.
So for the second option "perform the add in a separate temp index", should I worry about the index number?
For eg: main index has 1000 vectors, then the 1st vector of temp index will own index No.1001 in the final index.
Thanks!
No activity, closing.
@KangRinpoche
just set the No.index you need to add_with_ids
Most helpful comment
Hi
No you can't, see
https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls
Your options are, from simplest to most complex:
lock the index for search during add.
perform the add in a separate temp index (that is trained in the same way as the main index) then merge the main index with the temp index (see https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#splitting-and-merging-indexes). The index will still be unavailable for search during the merge but downtime will be shorter.
at search time, during the 4 hours, copy the index to an offline index, add vectors to that one, and swap indexes every 4 hours. No downtime but the index is stored twice.