Elasticsearch: Elasticsearch should reject _id longer than the maximum URI length

Created on 16 Jan 2016  路  7Comments  路  Source: elastic/elasticsearch

If a user indexes a document with an _id value longer than the maximum allowed length of an HTTP URI (for instance, with the java api), they will not be able to retrieve the document via id (the get-document API) using the HTTP API without resorting to something like the ids query.

Elasticsearch should reject ids that are this long, to ensure a document always remains retrievable

:CorInfrREST API >enhancement good first issue

Most helpful comment

Please cancel this annoying restriction!

All 7 comments

The RFC does not set a limit on the URL length, however, many clients and browsers _do_ set a limit, so we should limit ourselves as well.

Please cancel this annoying restriction!

Please cancel this annoying restriction!

It's far more productive to upfront say why this is change is restricting you. Perhaps there is a use case that we have not consisered. Perhaps you're doing something that would be done more effectively without using excessively long IDs. We want to help you but we can not make that assessment from what you've posted.

Thanks for reply!

We were using elasticsearch 1.7 to index crawled web pages, where long IDs where allowed. But now I'm migrating to 5.2
Crawlers use the page url as the id. You know, some web pages put alot of unnecessary details in the url, like the title of the page, or even some of the content !
I know it's not effective to use url as an ID, and maybe better to use a hash of it for example.
But it needs a lot of modifications

Thanks for your kindness!

@doried-a-a As you can see from the description above, we didn't make this change just because we felt like it. There is a genuine problem that is being solved.

yes, it means you will need to make changes when moving to 5.2, but then you will have to reindex your data in order to move to 5.2 from 1.7 anyway. This seems like the ideal time to make the change.

I know it's not effective to use url as an ID, and maybe better to use a hash of it for example.

Yes. If needed, you can store the URI as a field in the document.

I'll reindex anyway, right. Now I'm going to make changes in the crawlers too.
No problem.
Thanks for support!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brwe picture brwe  路  3Comments

abtpst picture abtpst  路  3Comments

matthughes picture matthughes  路  3Comments

martijnvg picture martijnvg  路  3Comments

Praveen82 picture Praveen82  路  3Comments