I'm trying to extend the bench_gpu_1bn example to include IndexIVFScalarQuantizers, with the goal of being able to leverage the GPU while populating an IVF,SQ8 index with tens of millions of instances, by either the GPU-shards-and-sync method (compute_populated_index) or the assign-on-gpu-and-add-on-GPU (compute_populated_index2) method.
With the first method, I'm encountering a failed assertion within GpuIndex::add_with_ids (via IndexShards): FAISS_THROW_IF_NOT_MSG(this->is_trained, "Index not trained"); -- it seems the trained index forgets that it is trained when shard-cloning to multiple GPUs. Naively setting gpu_index.is_trained = True in python does not change this behavior.
Copying the index to the GPU without shards does work, however.
With the second method, I haven't found an IndexIVFScalarQuantizer analogue for add_core_o (for IndexIVFPQ) or add_core (for IndexIVFFlat) by digging through the C++ source.
Any guidance on the right way to achieve GPU-assisted adds to IndexIVFScalarQuantizer would be appreciated, or pragmatic considerations on why/whether this should not be done.
Thank you!
Running on:
Interface:
Here's a small example that exhibits the is_trained failure:
import numpy as np
import faiss
tempmem = -1
ngpu = 2
d = 128
ncent = 50
xt = np.random.random((10000, d)).astype(np.float32)
xb = np.random.random((10000, d)).astype(np.float32)
def minimal_gpu_ivf_battle():
print('IVFPQ, no shard')
coarse_quantizer = make_coarse_quantizer()
ivf_pq = faiss.IndexIVFPQ(coarse_quantizer, d, ncent, 8, 8)
ivf_pq.train(xt)
populate_gpu_index(ivf_pq)
print('IVFPQ, shards')
coarse_quantizer = make_coarse_quantizer()
ivf_pq = faiss.IndexIVFPQ(coarse_quantizer, d, ncent, 8, 8)
ivf_pq.train(xt)
populate_gpu_shard_index(ivf_pq)
print('IVFSQ, no shard')
coarse_quantizer = make_coarse_quantizer()
ivf_sq = faiss.IndexIVFScalarQuantizer(coarse_quantizer, d, ncent, faiss.ScalarQuantizer.QT_8bit)
ivf_sq.train(xt)
populate_gpu_index(ivf_sq)
print('IVFSQ, shards')
coarse_quantizer = make_coarse_quantizer()
ivf_sq = faiss.IndexIVFScalarQuantizer(coarse_quantizer, d, ncent, faiss.ScalarQuantizer.QT_8bit)
ivf_sq.train(xt)
populate_gpu_shard_index(ivf_sq)
def make_coarse_quantizer():
clus = faiss.Clustering(d, ncent)
index = faiss.IndexFlatL2(d)
clus.train(xt, index)
centroids = faiss.vector_float_to_array(clus.centroids).reshape(ncent, d)
coarse_quantizer = faiss.IndexFlatL2(d)
coarse_quantizer.add(centroids)
return coarse_quantizer
def populate_gpu_index(index):
assert index.is_trained # <-- yes for PQ, yes for SQ
co = faiss.GpuMultipleClonerOptions()
co.indicesOptions = faiss.INDICES_CPU
co.verbose = True
co.shard = False
vres, vdev = make_vres_vdev()
gpu_index = faiss.index_cpu_to_gpu_multiple(
vres, vdev, index, co)
# assert gpu_index.is_trained # <-- yes for PQ, *no* for SQ
gpu_index.add(xb)
index_src = faiss.index_gpu_to_cpu(gpu_index)
index_src.copy_subset_to(index, 0, 0, xb.shape[0])
def populate_gpu_shard_index(index):
assert index.is_trained # <-- yes for PQ, yes for SQ
co = faiss.GpuMultipleClonerOptions()
co.indicesOptions = faiss.INDICES_CPU
co.verbose = True
co.shard = True
vres, vdev = make_vres_vdev()
gpu_index = faiss.index_cpu_to_gpu_multiple(
vres, vdev, index, co)
# assert gpu_index.is_trained # <-- yes for PQ, *no* for SQ
gpu_index.add_with_ids(xb, ids=np.arange(xb.shape[0]))
for i in range(ngpu):
index_src = faiss.index_gpu_to_cpu(gpu_index.at(i))
index_src.copy_subset_to(index, 0, 0, xb.shape[0])
# lazy-pasted from bench_gpu_1bn
gpu_resources = []
for i in range(ngpu):
res = faiss.StandardGpuResources()
gpu_resources.append(res)
def make_vres_vdev(i0=0, i1=-1):
" return vectors of device ids and resources useful for gpu_multiple"
vres = faiss.GpuResourcesVector()
vdev = faiss.IntVector()
if i1 == -1:
i1 = ngpu
for i in range(i0, i1):
vdev.push_back(i)
vres.push_back(gpu_resources[i])
return vres, vdev
if __name__ == '__main__':
minimal_gpu_ivf_battle()
My initial research points to the nature of the faiss::IndexIVFScalarQuantizer constructor to initialize its is_trained property to false unconditionally as the culprit
The reason faiss::IndexIVFPQ works is because is_trained is being manually set
I think a fix would be to add a similar manual resetting to the ScalarQuantizer branch of this sharding-code.
There were two bugs here. Add:
idx2.is_trained = index->is_trained;
idx2.sq = index_ivfsq->sq;
after
https://github.com/facebookresearch/faiss/blob/master/gpu/GpuCloner.cpp#L310
and similarly the is_trained for IVFFlat as well.
This will be fixed in the next update to Faiss.
Thank you! Glad to know this is a bug and not my own misunderstanding.
For the alternate index-population method illustrated in that benchmark (cluster assignment on the GPU and then using some sort of "core add" function to populate the clusters in a CPU index), is there an equivalent path for an IndexIVFScalarQuantizer? I can post this as a second Issue if y'all prefer.
Fixed in latest github version.
Most helpful comment
There were two bugs here. Add:
after
https://github.com/facebookresearch/faiss/blob/master/gpu/GpuCloner.cpp#L310
and similarly the is_trained for IVFFlat as well.
This will be fixed in the next update to Faiss.