Faiss: Question: relationship between PCA and PQ parameters? Trying to get intuition...

Created on 9 Nov 2017 · 2Comments · Source: facebookresearch/faiss

Sorry, this is not really an issue, more a theoretical question.

Let's say that I have a 2048D vector (for the sake of argument, let's assume that the vector is uint8 data type) and want to reduce that representation down to 16 bytes. On one end of the scale, I could apply PCA (or other dimensionality reduction technique) to get 16 dimensions. On the other end, I could use PQ directly on the 2048D vectors and obtain 16 byte codes.

There is probably a sweet spot between those two extremes (e.g. PCA to 128D and then PQ). That sweet spot could probably be found through parameter search and testing on a validation set.

What bothers me is I have zero intuition for what those parameters might be.

question

Source

billkle1n

Most helpful comment

Hi,

A practical answer: as a rule of thumb for a code of size c, first apply a OPQ transform (not PCA) to 4*c or 8*c, then encode with PQ.

opq = OPQMatrix(2048, 16, 4*16) 
pq = ProductQuantizer(4*16, 16, 8) 
opq.train(x)
xt = opq.apply_py(x)
pq.train(xt)
codes = pq.compute_codes(xt)

To decode:

x_decoded = opq.reverse_transform(pq.decode(codes))

mdouze on 9 Nov 2017

👍2

All 2 comments

Hi,

A practical answer: as a rule of thumb for a code of size c, first apply a OPQ transform (not PCA) to 4*c or 8*c, then encode with PQ.

opq = OPQMatrix(2048, 16, 4*16) 
pq = ProductQuantizer(4*16, 16, 8) 
opq.train(x)
xt = opq.apply_py(x)
pq.train(xt)
codes = pq.compute_codes(xt)

To decode:

x_decoded = opq.reverse_transform(pq.decode(codes))

mdouze on 9 Nov 2017

👍2

Closing.

mdouze on 22 Nov 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can Index be shard in CPU?

0DF0Arc · 3Comments

std::bad_alloc

linghuang · 3Comments

IndexIDMap2 Segmentation fault (core dumped

hipitt · 3Comments

How to search the nearest with hamming distance rather than L2 distance?

Tony-Hou · 3Comments

3 arguments asked for add_with_id, but only two shown in wiki

jukaradayi · 3Comments