Faiss: GPU add with IVFScalarQuantizer

Created on 28 Feb 2020 · 4Comments · Source: facebookresearch/faiss

Summary

I'm trying to extend the bench_gpu_1bn example to include IndexIVFScalarQuantizers, with the goal of being able to leverage the GPU while populating an IVF,SQ8 index with tens of millions of instances, by either the GPU-shards-and-sync method (compute_populated_index) or the assign-on-gpu-and-add-on-GPU (compute_populated_index2) method.

With the first method, I'm encountering a failed assertion within GpuIndex::add_with_ids (via IndexShards): FAISS_THROW_IF_NOT_MSG(this->is_trained, "Index not trained"); -- it seems the trained index forgets that it is trained when shard-cloning to multiple GPUs. Naively setting gpu_index.is_trained = True in python does not change this behavior.
Copying the index to the GPU without shards does work, however.

With the second method, I haven't found an IndexIVFScalarQuantizer analogue for add_core_o (for IndexIVFPQ) or add_core (for IndexIVFFlat) by digging through the C++ source.

Any guidance on the right way to achieve GPU-assisted adds to IndexIVFScalarQuantizer would be appreciated, or pragmatic considerations on why/whether this should not be done.

Thank you!

Running on:

[ ] CPU
[x] GPU

Interface:

[ ] C++
[x] Python

Reproduction instructions

Here's a small example that exhibits the is_trained failure:

import numpy as np
import faiss

tempmem = -1
ngpu = 2
d = 128
ncent = 50    
xt = np.random.random((10000, d)).astype(np.float32)
xb = np.random.random((10000, d)).astype(np.float32)

def minimal_gpu_ivf_battle():
    print('IVFPQ, no shard')
    coarse_quantizer = make_coarse_quantizer()
    ivf_pq = faiss.IndexIVFPQ(coarse_quantizer, d, ncent, 8, 8)
    ivf_pq.train(xt)
    populate_gpu_index(ivf_pq)

    print('IVFPQ, shards')
    coarse_quantizer = make_coarse_quantizer()
    ivf_pq = faiss.IndexIVFPQ(coarse_quantizer, d, ncent, 8, 8)
    ivf_pq.train(xt)
    populate_gpu_shard_index(ivf_pq)

    print('IVFSQ, no shard')
    coarse_quantizer = make_coarse_quantizer()
    ivf_sq = faiss.IndexIVFScalarQuantizer(coarse_quantizer, d, ncent, faiss.ScalarQuantizer.QT_8bit)
    ivf_sq.train(xt)
    populate_gpu_index(ivf_sq)

    print('IVFSQ, shards')
    coarse_quantizer = make_coarse_quantizer()
    ivf_sq = faiss.IndexIVFScalarQuantizer(coarse_quantizer, d, ncent, faiss.ScalarQuantizer.QT_8bit)
    ivf_sq.train(xt)
    populate_gpu_shard_index(ivf_sq)

def make_coarse_quantizer():
    clus = faiss.Clustering(d, ncent)
    index = faiss.IndexFlatL2(d)
    clus.train(xt, index)
    centroids = faiss.vector_float_to_array(clus.centroids).reshape(ncent, d)
    coarse_quantizer = faiss.IndexFlatL2(d)
    coarse_quantizer.add(centroids)
    return coarse_quantizer

def populate_gpu_index(index):
    assert index.is_trained  # <-- yes for PQ, yes for SQ

    co = faiss.GpuMultipleClonerOptions()
    co.indicesOptions = faiss.INDICES_CPU
    co.verbose = True
    co.shard = False

    vres, vdev = make_vres_vdev()
    gpu_index = faiss.index_cpu_to_gpu_multiple(
        vres, vdev, index, co)

    # assert gpu_index.is_trained  # <-- yes for PQ, *no* for SQ

    gpu_index.add(xb)
    index_src = faiss.index_gpu_to_cpu(gpu_index)
    index_src.copy_subset_to(index, 0, 0, xb.shape[0])

def populate_gpu_shard_index(index):
    assert index.is_trained  # <-- yes for PQ, yes for SQ

    co = faiss.GpuMultipleClonerOptions()
    co.indicesOptions = faiss.INDICES_CPU
    co.verbose = True
    co.shard = True

    vres, vdev = make_vres_vdev()
    gpu_index = faiss.index_cpu_to_gpu_multiple(
        vres, vdev, index, co)

    # assert gpu_index.is_trained  # <-- yes for PQ, *no* for SQ

    gpu_index.add_with_ids(xb, ids=np.arange(xb.shape[0]))
    for i in range(ngpu):
        index_src = faiss.index_gpu_to_cpu(gpu_index.at(i))
        index_src.copy_subset_to(index, 0, 0, xb.shape[0])


# lazy-pasted from bench_gpu_1bn
gpu_resources = []

for i in range(ngpu):
    res = faiss.StandardGpuResources()
    gpu_resources.append(res)


def make_vres_vdev(i0=0, i1=-1):
    " return vectors of device ids and resources useful for gpu_multiple"
    vres = faiss.GpuResourcesVector()
    vdev = faiss.IntVector()
    if i1 == -1:
        i1 = ngpu
    for i in range(i0, i1):
        vdev.push_back(i)
        vres.push_back(gpu_resources[i])
    return vres, vdev


if __name__ == '__main__':
    minimal_gpu_ivf_battle()

bug

Source

dadamson

Most helpful comment

There were two bugs here. Add:

            idx2.is_trained = index->is_trained;
            idx2.sq = index_ivfsq->sq;

after
https://github.com/facebookresearch/faiss/blob/master/gpu/GpuCloner.cpp#L310

and similarly the is_trained for IVFFlat as well.

This will be fixed in the next update to Faiss.

wickedfoo on 29 Feb 2020

👍2

All 4 comments

My initial research points to the nature of the faiss::IndexIVFScalarQuantizer constructor to initialize its is_trained property to false unconditionally as the culprit

The reason faiss::IndexIVFPQ works is because is_trained is being manually set

I think a fix would be to add a similar manual resetting to the ScalarQuantizer branch of this sharding-code.

laanak08 on 28 Feb 2020

❤1

There were two bugs here. Add:

            idx2.is_trained = index->is_trained;
            idx2.sq = index_ivfsq->sq;

after
https://github.com/facebookresearch/faiss/blob/master/gpu/GpuCloner.cpp#L310

and similarly the is_trained for IVFFlat as well.

This will be fixed in the next update to Faiss.

wickedfoo on 29 Feb 2020

👍2

Thank you! Glad to know this is a bug and not my own misunderstanding.

For the alternate index-population method illustrated in that benchmark (cluster assignment on the GPU and then using some sort of "core add" function to populate the clusters in a CPU index), is there an equivalent path for an IndexIVFScalarQuantizer? I can post this as a second Issue if y'all prefer.

dadamson on 2 Mar 2020

Fixed in latest github version.

wickedfoo on 10 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings