Elasticsearch: Consider increasing the dimension limit for vector fields.

Created on 27 Mar 2019  路  11Comments  路  Source: elastic/elasticsearch

The dense_vector and sparse_vector fields place a hard limit of 500 on the number of dimensions per vector. However, many of the common pretrained text embeddings like BERT, ELMo, and Universal Sentence Encoder produce vectors of larger dimensions, typically ranging from 512 to 1024.

Currently users must truncate the vectors, or perform an additional dimensionality reduction step. Perhaps we could make the dimension limit configurable, or at least increase it to a larger value?

:SearcMapping >enhancement

All 11 comments

Pinging @elastic/es-search

Thanks @jtibshirani

I don't see a problem of increasing the number of dimensions to 1024.

@jpountz do you see any problems for BinaryDocValuesField to have a value of 1024 X 4 = 4096 bytes ( in case of dense vectors)?
Or 1024 X 6 = 6144 bytes (in case of sparse vectors)?

This sounds good to me. I care more about the fact that there is a reasonable limit than about the actual value of the limit.

Hi, @mayya-sharipova!
Sorry for writing in old issue, but is it possible to increase limit once more?
It looks like there already exists NNets suitable for search tasks, exceeding 1024d limit.

For example mobilenet_v2 which produces 1280d vectors, as pointed out by @etudor in issue SthPhoenix/elastik-nearest-neighbors-extended#4

@SthPhoenix what should be a reasonable dims limit?

Actually I'm not sure, 1280d is largest embedding I've seen so far for common models.
I think 2048 might be sufficient for a while, but if there are no technical limitations maybe 4096 should be an overkill for long time.
Such large vector would heavily impact performance and memory footprint, but I think people who need this should know what they are doing.

Thanks! Hope this would be enough )

Hello guys, its possible to inscrease to 3072 dims?

@gabrielcustodio We have not encountered models or use case that require more than 2048 dims. Can you please describe your use-case or models that need this big number of dims?

I used this model
https://allennlp.org/elmo
on pt-br.

Actually the output is 3 layers with 1024 dims.

3x1024

Source: https://github.com/flairNLP/flair/issues/886

I load this model using flair library and then extract the embeddings.

Flair stacked embeddings (forward, backward, glove/flair) would produce vectors of more than 4096.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clintongormley picture clintongormley  路  3Comments

malpani picture malpani  路  3Comments

clintongormley picture clintongormley  路  3Comments

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments

rjernst picture rjernst  路  3Comments