eg. When M>30, nbits=8, the distance lookup table is unable to fit into typical E5 series CPU L1 cache which is 32k and cause PQ to slow down tremendously, even slower than BF method.
Hi
This is an interesting observation. The PQ is indeed not efficient for large M values. In fact, brute force is more efficient than PQ in almost all situations, PQ's advantage is that it compresses the data.
The cache issue can be is addressed by using polysemous codes, that avoids accessing the tables >90% of times). If memory is not an issue, the ScalarQuantizer is also an option for efficient search with a limited compression factor.
hi, @mdouze
I am interested in ScalarQuantizer, will it be implemented in GPU?
Thank you
@SupermeReturns No guarantees on the implementation of ScalarQuantizer on GPU, but I might be able to get to it sometime over the next month or two.
Thank for your reply!
By the way, I have another question.
A non-uniform scalar quantizier can be used to lower the quantization error, but it required more time training, will you consider implementing it?
helpful links:
http://web.engr.oregonstate.edu/~thinhq/teaching/ece499/spring06/lossy_quant.pdf
page 23
@wickedfoo
Hi
A non-uniform scalar quantizer defeats the purpose of the scalar quantizer, which is to be a faster option than the product quantizer. Indeed, a non-uniform SQ would require per-codeword memory lookups that are very slow.
@mdouze thanks
@SupermeReturns : for reference, hereafter was the first paper to consider compressed-domain search based on a non-uniform scalar quantizer:
https://scholar.google.co.uk/citations?view_op=view_citation&citation_for_view=1lcY2z4AAAAJ:ULOm3_A8WrAC
@jegou Thank you for your reference! I will come back if I have questions or find anything interesting about this paper.
Closing for now
Most helpful comment
@SupermeReturns No guarantees on the implementation of ScalarQuantizer on GPU, but I might be able to get to it sometime over the next month or two.