Elasticsearch: Expose dense vector iterator in painless

Created on 5 Feb 2020  路  9Comments  路  Source: elastic/elasticsearch

Painless does not have access to dense_vector at the moment except as an opaque value, users can pass it around but not access it's contents.

We can expose the vector as an iterator, which will give access to the data without revealing the internal representation.

cc: @mayya-sharipova @jtibshirani

:CorInfrScripting >enhancement CorInfra

Most helpful comment

The use case we encountered with is the ability to access data in order, just as #49695. The idea was to store historical records (price at a point of time) where we are then interested in deltas between two arbitrary points (price yesterday compared to price a year ago).

Another use case that came up was a user implementing custom vector function in painless.

All 9 comments

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

I would suggest NOT to expose dense vector iterator.
There are two reasons for this:

  1. I have not encountered a need to iterate over values of a dense_vector field, except probably the issue #49695, which a user solved through a custom plugin. We provide predefined vector functions, and would like users to access vectors only through these functions.
  2. Complex vector representation. Each vector is represented as a binary value that encodes not only an original vector values but also computed vector magnitude. In the future, we may add more data to this encoding. So even if a user gets this binary value, it will be challenging to decode it.

@stu-elastic I am wondering if you have any specific need to expose dense vector iterator?

@mayya-sharipova this is based on a request from @HonzaKral. After chatting we thought an iterator was pretty lightweight.

I'm trying to understand point 2, if we iterated over a vector and provided the values, what's the issue there? We can simply ignore the magnitude at the end.

@HonzaKral I am interested to learn about a use case to access vector values directly.

@stu-elastic

We can simply ignore the magnitude at the end.

For now depending on index version it is just magnitude at the end. But later we may add more metadata.

Were you planning to iterate over Binary DocValues as it would be tricky for a user to decode these docvalues?
Alternatively I can see how we can expose an iterator over float[] -- original vectors' values by first decoding them, something we do in our vector functions

Alternatively I can see how we can expose an iterator over float[] -- original vectors' values by first decoding them, something we do in our vector functions

That's what we were thinking, allow users to get their data back.

The use case we encountered with is the ability to access data in order, just as #49695. The idea was to store historical records (price at a point of time) where we are then interested in deltas between two arbitrary points (price yesterday compared to price a year ago).

Another use case that came up was a user implementing custom vector function in painless.

@HonzaKral thanks, looks to be valid use cases to me.

That's what we were thinking, allow users to get their data back.

@stu-elastic thanks, makes sense. We need to think how to implement it as DenseVectorScriptDocValues needs to know an index version to decode vectors in a right way.

Relevant request from another user of exposing vector functions in other painless contexts besides ScoreScript.CONTEXT.

@mayya-sharipova Just wanted to check to see if you would still like this to be exposed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthughes picture matthughes  路  3Comments

makeyang picture makeyang  路  3Comments

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments

clintongormley picture clintongormley  路  3Comments

brwe picture brwe  路  3Comments