Faiss: Question: relationship between PCA and PQ parameters? Trying to get intuition...

Created on 9 Nov 2017  路  2Comments  路  Source: facebookresearch/faiss

Sorry, this is not really an issue, more a theoretical question.

Let's say that I have a 2048D vector (for the sake of argument, let's assume that the vector is uint8 data type) and want to reduce that representation down to 16 bytes. On one end of the scale, I could apply PCA (or other dimensionality reduction technique) to get 16 dimensions. On the other end, I could use PQ directly on the 2048D vectors and obtain 16 byte codes.

There is probably a sweet spot between those two extremes (e.g. PCA to 128D and then PQ). That sweet spot could probably be found through parameter search and testing on a validation set.

What bothers me is I have zero intuition for what those parameters might be.

question

Most helpful comment

Hi,

A practical answer: as a rule of thumb for a code of size c, first apply a OPQ transform (not PCA) to 4*c or 8*c, then encode with PQ.

opq = OPQMatrix(2048, 16, 4*16) 
pq = ProductQuantizer(4*16, 16, 8) 
opq.train(x)
xt = opq.apply_py(x)
pq.train(xt)
codes = pq.compute_codes(xt)

To decode:

x_decoded = opq.reverse_transform(pq.decode(codes))

All 2 comments

Hi,

A practical answer: as a rule of thumb for a code of size c, first apply a OPQ transform (not PCA) to 4*c or 8*c, then encode with PQ.

opq = OPQMatrix(2048, 16, 4*16) 
pq = ProductQuantizer(4*16, 16, 8) 
opq.train(x)
xt = opq.apply_py(x)
pq.train(xt)
codes = pq.compute_codes(xt)

To decode:

x_decoded = opq.reverse_transform(pq.decode(codes))

Closing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0DF0Arc picture 0DF0Arc  路  3Comments

linghuang picture linghuang  路  3Comments

hipitt picture hipitt  路  3Comments

Tony-Hou picture Tony-Hou  路  3Comments

jukaradayi picture jukaradayi  路  3Comments