Faiss: gpu indexflat fail

Created on 6 Jun 2017 · 12Comments · Source: facebookresearch/faiss

Hello, I 've train to search 10 Million Vectors by GpuIndexFlat. But fail , even when I queried only one vector in index.search.

~/faiss/gpu$ ./test/demo_ivfpq_indexing_gpu
[4.324 s] Building a dataset of 10000000 vectors to index
[36.067 s] Adding the vectors to the index
[38.161 s] done
[38.195 s] done
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 1280000000 B, highwater 1280000000 B)
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 1280000000 B, highwater 2560000000 B)
Faiss assertion err == cudaSuccess failed in void faiss::gpu::StackDeviceMemory::Stack::returnAlloc(char*, size_t, cudaStream_t) at utils/StackDeviceMemory.cpp:136Aborted (core dumped)

question

Source

xasw1

Most helpful comment

You can store that many vectors (and more, up to a billion or so) on a single GPU, you just can't use GpuIndexFlat or GpuIndexIVFFlat. You'd have to compress the data using GpuIndexIVFPQ.

wickedfoo on 7 Jun 2017

👍2

All 12 comments

Should all vectors in index.add load in Gpu when I performing gpu search?

xasw1 on 6 Jun 2017

What is the dimensionality of your vectors?

We also might have a fix for this locally that we haven't pushed to github yet.

wickedfoo on 6 Jun 2017

Are you using GpuIndexIVFPQ or GpuIndexFlat?

If GpuIndexIVFPQ, in the meantime, you can try disabling precomputed codes (setPrecomputedCodes(false)) since you appear to be running out of memory.

wickedfoo on 6 Jun 2017

Hi, I'm using GpuIndexFlat(INNER_PRODUCT) with the dimension = 256 and My Gpu is nvidia Titan X.

xasw1 on 7 Jun 2017

By the way , When I used GpuIndexIVFFlat with nt = 1,000,000 , database = 50,000,000 , nlist = 28284, I received different problem:

Training IVF quantizer on 1000000 vectors in 256D
WARNING clustering 1000000 points to 28284 centroids: please provide at least 1103076 training points
Clustering 1000000 points in 256D to 28284 clusters, redo 1 times, 10 iterations
Preprocessing in 0.25 s
Iteration 0 (71.66 s, search 71.52 s): objective=7.60792e+06 imbalance=16.259 Iteration 1 (144.12 s, search 143.90 s): objective=8.12524e+06 imbalance=2.999 Iteration 2 (216.85 s, search 216.56 s): objective=8.13144e+06 imbalance=2.557 Iteration 3 (289.57 s, search 289.22 s): objective=8.13492e+06 imbalance=2.391 Iteration 4 (362.31 s, search 361.88 s): objective=8.13722e+06 imbalance=2.306 Iteration 5 (435.44 s, search 434.94 s): objective=8.13878e+06 imbalance=2.253 Iteration 6 (508.27 s, search 507.70 s): objective=8.14026e+06 imbalance=2.217 Iteration 7 (581.38 s, search 580.74 s): objective=8.14126e+06 imbalance=2.189 Iteration 8 (654.35 s, search 653.64 s): objective=8.14207e+06 imbalance=2.169 Iteration 9 (727.65 s, search 726.85 s): objective=8.14278e+06 imbalance=2.152 nsplit=0
[734.751 s] storing the pre-trained index to /tmp/index_trained.faissindex
[734.883 s] Building a dataset of 50000000 vectors to index
[895.238 s] Adding the vectors to the index
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: an illegal memory access was encountered
Aborted (core dumped)

xasw1 on 7 Jun 2017

For GpuIndexFlat, 10 million vectors * 256 * sizeof(float) = 10.24 GB of memory. Faiss will reserve some fraction (1-2 GB) of GPU memory up front for temporary space, so you will run out of memory in this case (your card has 12 GB I believe).

For the GpuIndexIVFPQ case, we have a fix for add() internally that we'll be updating the repo with sometime soon.

wickedfoo on 7 Jun 2017

Can I store Vectors on Cpu but search by Gpu?

xasw1 on 7 Jun 2017

With unified memory you could, but that defeats the point of using the GPU, as you are limited by the speed of the interconnect between the CPU and GPU (e.g., PCIe or NVLINK). Doing that would run much more slowly than just using the CPU for everything.

wickedfoo on 7 Jun 2017

So it means that if my database have more than 100Millions vectors (256D) , I need more than 10 nvidia cards to store them on Gpu , otherwise, I can't query on Gpu?

xasw1 on 7 Jun 2017

You can store that many vectors (and more, up to a billion or so) on a single GPU, you just can't use GpuIndexFlat or GpuIndexIVFFlat. You'd have to compress the data using GpuIndexIVFPQ.

wickedfoo on 7 Jun 2017

👍2

New question. I found that 10M vectors could be added by GpuIndexIVFFlat and I've just found that when I add 8.25Million vectors to GpuIndexFlat , nvidia-smi showed it spent only 6000M+ memory. And strangely, it used the same memory size on Gpu when adding 5Million vectors and 8Million vectors. But when I add 8.5Million vectors, nvidia-smi shows it spent 2397M memory which is workspace I guess. I think it means adding not success. So why 1.5M vectors couldn't be added, when there still 5000+M free video memory?

xasw1 on 8 Jun 2017

nvidia-smi seems to report inaccurate memory usage sometimes.
See @wickedfoo's comment above on memory usage.