Faiss: Running on linux with bigset

Created on 1 Aug 2018 · 5Comments · Source: facebookresearch/faiss

OS: CentOS7

Faiss version: 1.2.1

Running on:

[x] CPU

Interface:

[x] Python

Hello,I want to know about the performance of this, such as how much (128D)data can be indexed and be search whit HNSW in a machine with 128G RAM. 1 biliion , 10 billion or more? Can i set threads number in search?

question

Source

huangqinquan

Most helpful comment

Although I have not explored HNSW yet, what I can do say based on my experiments on other indexing scheme is that it should be possible to index and search even 10B or more with 128GB of RAM.

_But there is the catch_: You need to modify the benchmark bench_hnsw.py script to support mmaped version (refer bench_gpu_1b.py) and enable a huge swap space in the system. This will slow down your whole process due to page faults when trying >1B (or maybe 512M in case of HNSW) but you should be able to run.

With respect to multiple threads - just before the search - add the following line - I tried it in HNSW and it worked:
faiss.omp_set_num_threads(8)

msharmavikram on 1 Aug 2018

👍2

All 5 comments

Although I have not explored HNSW yet, what I can do say based on my experiments on other indexing scheme is that it should be possible to index and search even 10B or more with 128GB of RAM.

With respect to multiple threads - just before the search - add the following line - I tried it in HNSW and it worked:
faiss.omp_set_num_threads(8)

msharmavikram on 1 Aug 2018

👍2

@msharmavikram
Thanks for your answer.But I get a longer time when I increase the number of threads.I have no idea about it. Here is my search code.When set faiss.omp_set_num_threads(8) to 1, the time is the shortest

def evaluate(index):
    # for timing with a single core
    faiss.omp_set_num_threads(1)

    loopnum = 10000

    t0 = time.time()
    for i in range(loopnum):
        D, I = index.search(xq, 20)
    t1 = time.time()

    print "\t %7.3f ms per query" % (
        (t1 - t0) * 1000.0 / loopnum / nq)

    return ((t1 - t0) * 1000.0 / nq)

huangqinquan on 2 Aug 2018

This is what I tried:
1 thread:
faiss.omp_set_num_threads(1)

$ python bench_hnsw.py hnsw_sq
load data
load GT
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 6
Adding 1 elements at level 6
Adding 0 elements at level 5
Adding 14 elements at level 4
Adding 229 elements at level 3
Adding 3668 elements at level 2
Adding 58368 elements at level 1
Adding 937720 elements at level 0
Done in 11107.380 ms
search
efSearch 16        0.264 ms per query, R@1 0.7239
efSearch 32        0.350 ms per query, R@1 0.8532
efSearch 64        0.546 ms per query, R@1 0.9247
efSearch 128       0.982 ms per query, R@1 0.9566
efSearch 256       1.920 ms per query, R@1 0.9718

4 threads:
faiss.omp_set_num_threads(4)

$ python bench_hnsw.py hnsw_sq
load data
load GT
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 6
Adding 1 elements at level 6
Adding 0 elements at level 5
Adding 14 elements at level 4
Adding 229 elements at level 3
Adding 3668 elements at level 2
Adding 58368 elements at level 1
Adding 937720 elements at level 0
Done in 10981.105 ms
search
efSearch 16        0.067 ms per query, R@1 0.7267
efSearch 32        0.108 ms per query, R@1 0.8540
efSearch 64        0.205 ms per query, R@1 0.9245
efSearch 128       0.345 ms per query, R@1 0.9563
efSearch 256       0.656 ms per query, R@1 0.9712

For 4thread - speed up is about 2.9X which seems logical as per Amdal's law.
To really measure correct performance, ensure that there is no parallel task running and your system is idle. Perhaps that might be the reason why you are getting degraded performance.

msharmavikram on 2 Aug 2018

👍1

@huangqinquan, it is hard to index more than a few tens of millions of vectors with HNSW, because add time is slow and 1B elements are hard to fit in 128G of RAM. If you are interested, here is a paper we wrote to address this case: http://openaccess.thecvf.com/content_cvpr_2018/papers/Douze_Link_and_Code_CVPR_2018_paper.pdf
The preferred way if indexing this many vectors is to use an IndexIVF* variant, see
https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index