Hi there!
I'm wondering if there is any progress on this TODO on serializing indices to strings for usage in pickled objects. Is there an advised way of storing indices out of memory for later use? What about from the GPU?
Thanks!
Hi,
The easy answer is about GPU: we will not support serializing GPU indexes because it is easy to move to CPU first.
The string serialization will become an important topic somewhere around the end of year. I will keep this issue updated.
@mdouze
Hi,
Any update regarding serializing the index object ? It's extremely necessary to pickle it, as otherwise the index building step takes up a lot of time, which is not feasible for my application.
Thanks
+1 for this.
Not being able to pickle faiss indexes means that they can't be used in parallel threads, for example using the multiprocessing module. I have a use case where I need to do thousands of searches repeatedly as part of an optimization routine, and not being able to serialize a faiss index and thus use multiprocessing means it will take a week or two to run in a single-thread instead of a few hours on a 32-64 processor cloud instance. Every time I need to run it again, another week... Anything that can be done to allow the indices to be serialized would be greatly appreciated! :)
Hi
Update: we will support serialization in the next release, probably this week.
However, using multiprocessing to parallelize on top of Faiss is suboptimal for 2 reasons:
See also
https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls#performance-of-search
Thanks for the super-fast reply and suggestions! :)
I'm glad to hear faiss will be serializable soon, and when I thought about what I'm trying to do more, I also determined that trying to pickle an 8GB index and expecting that to be faster than just using it didn't exactly make sense. I was stumped until I read your message.
I'll look into using multiprocessing.dummy or the Faiss threads if I can figure them out. Thank you! :)
@mdouze is there any update on this? Do you have an estimation on the release date?
I see a tagged release!
@david4096 unfortunately it is not related :(
Index serialization was implemented in the last release, see
https://github.com/facebookresearch/faiss/blob/master/tests/test_index_composite.py#L427
Most helpful comment
Hi,
The easy answer is about GPU: we will not support serializing GPU indexes because it is easy to move to CPU first.
The string serialization will become an important topic somewhere around the end of year. I will keep this issue updated.