I am using Faiss to search indexes that will not fit into memory, following instructions from demo_ondisk_ivf.py.
The above works as expected. Moreover, if I make a directory /tmp/test/ and move /tmp/populated.index to /tmp/test/populated.index, everything works fine. However, if I move /tmp/merged_index.ivfdata to any other directory, rename it, or change any of it's parent directories after saving the index, faiss.read_index() fails with:
RuntimeError: Error in void faiss::OnDiskInvertedLists::do_mmap() at OnDiskInvertedLists.cpp:243: Error: 'f' failed: could not open /tmp/merged_index.ivfdata in mode r+: No such file or directory
This may seem like a benign issue given a static filesystem, but it makes transporting on disk indexes between drives, machines, or restoration from a backup very difficult. (I am currently standing up a service using > 1TB of on disk indexes, and I would hate for it to go down because of _who-knows-what_ and some hardcoded paths.)
Would it be possible to:
A) Have a flag that tells Faiss to check for a mirrored name in the same directory? (i.e. faiss.read_index('/path/to/my.index', faiss.SAME_DIR) searches in /path/to/ for my.ivfdata)
or
B) Have a recovery module in Python that will load my.ivfdata (even after a path change) and allow me to update the hard-coded OnDiskInvertedList.filename and then save a new index?
Note: Solutions to this issue are mentioned in #713 (closed), but they are noted as low-priority. I am opening this issue as an appeal to bump it to a higher priority because it would be very useful for everyone running a service using an OnDiskInvertedList. (I would try to fix it myself, but I only have experience with Python.)
OS: Mac/Ubuntu/CentOS
Faiss version: 1.4.0
Running on:
Interface:
1) Run demo_ondisk_ivf.py up to stage 6
2) Sanity check: $ mkdir /tmp/test/ && mv /tmp/populated.index /tmp/test/.
3) Edit index path to 'test/populated.index' in line 103 of demo_ondisk_ivf.py and stage 6 will succeed
4) Mess it up: $ mv /tmp/merged_index.ivfdata /tmp/test/.
5) Fails: Run stage 6 of demo_ondisk_ivf.py
6) Restore: $ mv /tmp/test/merged_index.ivfdata /tmp/.
7) Succeeds: Run stage 6 of demo_ondisk_ivf.py
Ok, it becomes higher priority. I will try to implement something when I have some time.
If you are moving files you must do stage 5 (merge inv files) before stage 6 (search).
I noticed you added the faiss.IO_FLAG_ONDISK_SAME_DIR flag as of version 1.5.1.
I tested it, and it works well for my purposes.
Thank you for taking the time. Closing.
Most helpful comment
Ok, it becomes higher priority. I will try to implement something when I have some time.