Faiss: [Question] About Sparse Vectors

Created on 27 Mar 2019  路  6Comments  路  Source: facebookresearch/faiss

I'm aware that a Facebook PySparNN was most suited for sparse vectors approximate nearest Neighbor calculation using the Cluster Pruning approach and having a O(sqrt(N)) for a N-sized index.
My concerns are that it does not support GPU and it seems that the project is not maintained anymore. Since we are using FAISS already for NN, my question is if I could move to FAISS for sparse vectors too and if you have any test cases for sparse vectors clustering.

Thank you.

question

All 6 comments

Faiss does not have specific support for sparse vectors, so you will have to convert them to dense.
However, there is a gray zone between 1% and 50% non-zeros in the vectors where it is not clear whether it is more efficient to handle them as sparse or dense. How sparse are your vectors?

@mdouze that's a good point, thanks you. So I have ~1M columns in the tensor, and I assume we can do to measure sparsity like

sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )

it's about ~ .99, i.e its density is ~ .1

No activity, closing.

Hello, I have similar use case to the one described above were I have around ~1M columns in each vectors but their density is ~ .1 Does Faiss provides some support to sparse vectors without first doing dimensionality reduction?

@alejandrojcastaneira I am afraid it will be difficult to train as it will require huge RAM.

Faiss does not have specific support for sparse vectors, so you will have to convert them to dense.
However, there is a gray zone between 1% and 50% non-zeros in the vectors where it is not clear whether it is more efficient to handle them as sparse or dense. How sparse are your vectors?

Thank you so much for the explanation. I am doing content filtering right now. Since word2vec feature is not good for our smaller data, we are using tfidf instead. We directly convert the sparse vector to dense from spark result and it works well with faiss. My question is what do you mean by 'doesn't support sparse vector', do you mean the model doesn't work theoretically when there are 1% and 50% non-zeros, or do you mean the data with 1% and 50% non-zeros is so large that it will not fit into the memory? Thank you so much in advance.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xxllp picture xxllp  路  3Comments

brunodoamaral picture brunodoamaral  路  3Comments

Tony-Hou picture Tony-Hou  路  3Comments

zjjott picture zjjott  路  3Comments

daniellevy picture daniellevy  路  3Comments