Elasticsearch: Elasticsearch should reject _id longer than the maximum URI length

Created on 16 Jan 2016 · 7Comments · Source: elastic/elasticsearch

If a user indexes a document with an _id value longer than the maximum allowed length of an HTTP URI (for instance, with the java api), they will not be able to retrieve the document via id (the get-document API) using the HTTP API without resorting to something like the ids query.

Elasticsearch should reject ids that are this long, to ensure a document always remains retrievable

:CorInfrREST API >enhancement good first issue

Source

dakrone

Most helpful comment

Please cancel this annoying restriction!

doried-a-a on 24 Feb 2017

👍5

All 7 comments

The RFC does not set a limit on the URL length, however, many clients and browsers _do_ set a limit, so we should limit ourselves as well.

dakrone on 16 Jan 2016

Please cancel this annoying restriction!

doried-a-a on 24 Feb 2017

👍5

Please cancel this annoying restriction!

It's far more productive to upfront say why this is change is restricting you. Perhaps there is a use case that we have not consisered. Perhaps you're doing something that would be done more effectively without using excessively long IDs. We want to help you but we can not make that assessment from what you've posted.

jasontedor on 24 Feb 2017

Thanks for reply!

We were using elasticsearch 1.7 to index crawled web pages, where long IDs where allowed. But now I'm migrating to 5.2
Crawlers use the page url as the id. You know, some web pages put alot of unnecessary details in the url, like the title of the page, or even some of the content !
I know it's not effective to use url as an ID, and maybe better to use a hash of it for example.
But it needs a lot of modifications

Thanks for your kindness!

doried-a-a on 25 Feb 2017

@doried-a-a As you can see from the description above, we didn't make this change just because we felt like it. There is a genuine problem that is being solved.

yes, it means you will need to make changes when moving to 5.2, but then you will have to reindex your data in order to move to 5.2 from 1.7 anyway. This seems like the ideal time to make the change.