If a user indexes a document with an _id
value longer than the maximum allowed length of an HTTP URI (for instance, with the java api), they will not be able to retrieve the document via id (the get-document API) using the HTTP API without resorting to something like the ids
query.
Elasticsearch should reject ids that are this long, to ensure a document always remains retrievable
The RFC does not set a limit on the URL length, however, many clients and browsers _do_ set a limit, so we should limit ourselves as well.
Please cancel this annoying restriction!
Please cancel this annoying restriction!
It's far more productive to upfront say why this is change is restricting you. Perhaps there is a use case that we have not consisered. Perhaps you're doing something that would be done more effectively without using excessively long IDs. We want to help you but we can not make that assessment from what you've posted.
Thanks for reply!
We were using elasticsearch 1.7 to index crawled web pages, where long IDs where allowed. But now I'm migrating to 5.2
Crawlers use the page url as the id. You know, some web pages put alot of unnecessary details in the url, like the title of the page, or even some of the content !
I know it's not effective to use url as an ID, and maybe better to use a hash of it for example.
But it needs a lot of modifications
Thanks for your kindness!
@doried-a-a As you can see from the description above, we didn't make this change just because we felt like it. There is a genuine problem that is being solved.
yes, it means you will need to make changes when moving to 5.2, but then you will have to reindex your data in order to move to 5.2 from 1.7 anyway. This seems like the ideal time to make the change.
I know it's not effective to use url as an ID, and maybe better to use a hash of it for example.
Yes. If needed, you can store the URI as a field in the document.
I'll reindex anyway, right. Now I'm going to make changes in the crawlers too.
No problem.
Thanks for support!
Most helpful comment
Please cancel this annoying restriction!