Faiss: gpu indexflat fail

Created on 6 Jun 2017  路  12Comments  路  Source: facebookresearch/faiss

Hello, I 've train to search 10 Million Vectors by GpuIndexFlat. But fail , even when I queried only one vector in index.search.

~/faiss/gpu$ ./test/demo_ivfpq_indexing_gpu
[4.324 s] Building a dataset of 10000000 vectors to index
[36.067 s] Adding the vectors to the index
[38.161 s] done
[38.195 s] done
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 1280000000 B, highwater 1280000000 B)
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 1280000000 B, highwater 2560000000 B)
Faiss assertion err == cudaSuccess failed in void faiss::gpu::StackDeviceMemory::Stack::returnAlloc(char*, size_t, cudaStream_t) at utils/StackDeviceMemory.cpp:136Aborted (core dumped)

question

Most helpful comment

You can store that many vectors (and more, up to a billion or so) on a single GPU, you just can't use GpuIndexFlat or GpuIndexIVFFlat. You'd have to compress the data using GpuIndexIVFPQ.

All 12 comments

Should all vectors in index.add load in Gpu when I performing gpu search?

What is the dimensionality of your vectors?

We also might have a fix for this locally that we haven't pushed to github yet.

Are you using GpuIndexIVFPQ or GpuIndexFlat?

If GpuIndexIVFPQ, in the meantime, you can try disabling precomputed codes (setPrecomputedCodes(false)) since you appear to be running out of memory.

Hi, I'm using GpuIndexFlat(INNER_PRODUCT) with the dimension = 256 and My Gpu is nvidia Titan X.

By the way , When I used GpuIndexIVFFlat with nt = 1,000,000 , database = 50,000,000 , nlist = 28284, I received different problem:

Training IVF quantizer on 1000000 vectors in 256D
WARNING clustering 1000000 points to 28284 centroids: please provide at least 1103076 training points
Clustering 1000000 points in 256D to 28284 clusters, redo 1 times, 10 iterations
Preprocessing in 0.25 s
Iteration 0 (71.66 s, search 71.52 s): objective=7.60792e+06 imbalance=16.259 Iteration 1 (144.12 s, search 143.90 s): objective=8.12524e+06 imbalance=2.999 Iteration 2 (216.85 s, search 216.56 s): objective=8.13144e+06 imbalance=2.557 Iteration 3 (289.57 s, search 289.22 s): objective=8.13492e+06 imbalance=2.391 Iteration 4 (362.31 s, search 361.88 s): objective=8.13722e+06 imbalance=2.306 Iteration 5 (435.44 s, search 434.94 s): objective=8.13878e+06 imbalance=2.253 Iteration 6 (508.27 s, search 507.70 s): objective=8.14026e+06 imbalance=2.217 Iteration 7 (581.38 s, search 580.74 s): objective=8.14126e+06 imbalance=2.189 Iteration 8 (654.35 s, search 653.64 s): objective=8.14207e+06 imbalance=2.169 Iteration 9 (727.65 s, search 726.85 s): objective=8.14278e+06 imbalance=2.152 nsplit=0
[734.751 s] storing the pre-trained index to /tmp/index_trained.faissindex
[734.883 s] Building a dataset of 50000000 vectors to index
[895.238 s] Adding the vectors to the index
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: an illegal memory access was encountered
Aborted (core dumped)

For GpuIndexFlat, 10 million vectors * 256 * sizeof(float) = 10.24 GB of memory. Faiss will reserve some fraction (1-2 GB) of GPU memory up front for temporary space, so you will run out of memory in this case (your card has 12 GB I believe).

For the GpuIndexIVFPQ case, we have a fix for add() internally that we'll be updating the repo with sometime soon.

Can I store Vectors on Cpu but search by Gpu?

With unified memory you could, but that defeats the point of using the GPU, as you are limited by the speed of the interconnect between the CPU and GPU (e.g., PCIe or NVLINK). Doing that would run much more slowly than just using the CPU for everything.

So it means that if my database have more than 100Millions vectors (256D) , I need more than 10 nvidia cards to store them on Gpu , otherwise, I can't query on Gpu?

You can store that many vectors (and more, up to a billion or so) on a single GPU, you just can't use GpuIndexFlat or GpuIndexIVFFlat. You'd have to compress the data using GpuIndexIVFPQ.

New question. I found that 10M vectors could be added by GpuIndexIVFFlat and I've just found that when I add 8.25Million vectors to GpuIndexFlat , nvidia-smi showed it spent only 6000M+ memory. And strangely, it used the same memory size on Gpu when adding 5Million vectors and 8Million vectors. But when I add 8.5Million vectors, nvidia-smi shows it spent 2397M memory which is workspace I guess. I think it means adding not success. So why 1.5M vectors couldn't be added, when there still 5000+M free video memory?

nvidia-smi seems to report inaccurate memory usage sometimes.
See @wickedfoo's comment above on memory usage.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zjjott picture zjjott  路  3Comments

linghuang picture linghuang  路  3Comments

jukaradayi picture jukaradayi  路  3Comments

0DF0Arc picture 0DF0Arc  路  3Comments

brunodoamaral picture brunodoamaral  路  3Comments