Elasticsearch: `sparse_vector` is deprecated

Created on 27 Feb 2020  路  7Comments  路  Source: elastic/elasticsearch

Please don't get rid of sparse_vector!

The docs solicit feedback about this from readers:

We have not seen much interest in this experimental field type, and don鈥檛 see a clear use case as it鈥檚 currently designed. If you have feedback or suggestions around sparse vector functionality, please let us know through GitHub or the discuss forums.

https://www.elastic.co/guide/en/elasticsearch/reference/current/sparse-vector.html

Elasticsearch moves very fast and I have been looking "forward" to using the new vector types for some time now. I finally started in earnest this week, only to discover that sparse_vector is now deprecated!

My data set contains vectors in a space roughly R^1e4 but ~each~ _most_ vectors compress acceptably into R^500 or R^1024 (whichever limit actually applies to me, I've seen both floating around and I'm unsure which to believe yet.)

If sparse_vector goes away, am I back to choosing between mappings with tens of thousands of fields or nested docs?

Thanks!

:SearcRanking Search

Most helpful comment

@mayya-sharipova

We sort documents using a linear projection (dot product.)

In many cases our end users select which dimensions are important to them when formulating their query.

We may combine this ordering of documents with other filters (for instance on text fields) or joins.

Is this the kind of detail you're interested in?

Thanks for telling me about flattened it looks like it will be very useful in other applications!

Thanks again!

All 7 comments

Pinging @elastic/es-search (:Search/Ranking)

@dvisztempacct Thanks for letting us know about the your need for sparse_vectors. But we would like to know more details about your use case. For example, what type of query and scoring you were planning to run on you data?

Also, consider if rank_features data type can address your use case. This type can also represent sparse features. A rank_feature doesn't provide vector similarity scoring, but can be very efficient.

hi @mayya-sharipova

rank_feature query, unfortunately, has no linear operation mode, as documented in this issue:

https://github.com/elastic/elasticsearch/issues/49859

The proposed work-around might be okay, but I'm hesitant to try it in our application since that will require more work and feedback to figure out.

@dvisztempacct We would like to learn more details about your use case for sparse vectors/rank features. In particular, can you please give an example what kind of queries you would like to run for your use case?

I forgot to mention that we have another new data type that can prevent mapping explosion. It is flattened. May be it may be of use to you.

@mayya-sharipova

We sort documents using a linear projection (dot product.)

In many cases our end users select which dimensions are important to them when formulating their query.

We may combine this ordering of documents with other filters (for instance on text fields) or joins.

Is this the kind of detail you're interested in?

Thanks for telling me about flattened it looks like it will be very useful in other applications!

Thanks again!

I forgot to mention that we are planning to experiment also with document similarity but it is not a feature we currently use in elasticsearch.

We have decided for now NOT to re-introduce sparse_vector. But we will be looking into potentially introducing a linear operation into rank_feature query.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dadoonet picture dadoonet  路  3Comments

malpani picture malpani  路  3Comments

matthughes picture matthughes  路  3Comments

brwe picture brwe  路  3Comments

rjernst picture rjernst  路  3Comments