Faiss: OnDiskInvertedLists are not portable

Created on 19 Mar 2019  路  3Comments  路  Source: facebookresearch/faiss

Summary

I am using Faiss to search indexes that will not fit into memory, following instructions from demo_ondisk_ivf.py.

The above works as expected. Moreover, if I make a directory /tmp/test/ and move /tmp/populated.index to /tmp/test/populated.index, everything works fine. However, if I move /tmp/merged_index.ivfdata to any other directory, rename it, or change any of it's parent directories after saving the index, faiss.read_index() fails with:

RuntimeError: Error in void faiss::OnDiskInvertedLists::do_mmap() at OnDiskInvertedLists.cpp:243: Error: 'f' failed: could not open /tmp/merged_index.ivfdata in mode r+: No such file or directory

This may seem like a benign issue given a static filesystem, but it makes transporting on disk indexes between drives, machines, or restoration from a backup very difficult. (I am currently standing up a service using > 1TB of on disk indexes, and I would hate for it to go down because of _who-knows-what_ and some hardcoded paths.)

Would it be possible to:
A) Have a flag that tells Faiss to check for a mirrored name in the same directory? (i.e. faiss.read_index('/path/to/my.index', faiss.SAME_DIR) searches in /path/to/ for my.ivfdata)
or
B) Have a recovery module in Python that will load my.ivfdata (even after a path change) and allow me to update the hard-coded OnDiskInvertedList.filename and then save a new index?

Note: Solutions to this issue are mentioned in #713 (closed), but they are noted as low-priority. I am opening this issue as an appeal to bump it to a higher priority because it would be very useful for everyone running a service using an OnDiskInvertedList. (I would try to fix it myself, but I only have experience with Python.)

Platform

OS: Mac/Ubuntu/CentOS

Faiss version: 1.4.0

Running on:

  • [x] CPU
  • [ ] GPU

Interface:

  • [ ] C++
  • [x] Python 3.6

Reproduction instructions

1) Run demo_ondisk_ivf.py up to stage 6
2) Sanity check: $ mkdir /tmp/test/ && mv /tmp/populated.index /tmp/test/.
3) Edit index path to 'test/populated.index' in line 103 of demo_ondisk_ivf.py and stage 6 will succeed
4) Mess it up: $ mv /tmp/merged_index.ivfdata /tmp/test/.
5) Fails: Run stage 6 of demo_ondisk_ivf.py
6) Restore: $ mv /tmp/test/merged_index.ivfdata /tmp/.
7) Succeeds: Run stage 6 of demo_ondisk_ivf.py

duplicate enhancement

Most helpful comment

Ok, it becomes higher priority. I will try to implement something when I have some time.

All 3 comments

Ok, it becomes higher priority. I will try to implement something when I have some time.

If you are moving files you must do stage 5 (merge inv files) before stage 6 (search).

I noticed you added the faiss.IO_FLAG_ONDISK_SAME_DIR flag as of version 1.5.1.

I tested it, and it works well for my purposes.

Thank you for taking the time. Closing.

Was this page helpful?
0 / 5 - 0 ratings