Elasticsearch: Merge mappings API

Created on 13 Nov 2019  路  5Comments  路  Source: elastic/elasticsearch

Describe the feature:

Using the reindex API to copy data from multiple indices into a new index is a great tool and is probably used for many different use cases. We use it in ML too in order to create a static copy of the source indices of data frame analytics, where we can then perform the analysis assuming no data is changing and without affecting production indices.

However, every single use case like this poses the following problem: what should the mappings of the new index be?

At the moment reindex API pushes this responsibility to the user. This has an important benefit: it gives the flexibility to the user to explicitly specify mappings in order to have different mappings than the ones in the reindexed indices. This is one of reindex's main use cases.

However, for those use cases where a copy is intended, having to cope with merging mappings across multiple indices is a hard task to push to users that do not know the inner workings of mappings. I would expect many users out there have written their own way to merge mappings and each probably has edge cases waiting to cause problems. ML certainly has a mappings merging attempt.

I propose there is benefit in adding an API that attempts to merge mappings over some target indices. An optimistic API would be good enough: merge mappings as long as they are exactly the same over target indices. Fields that exist in some of the indices would also be included (as long as there are no conflicts).

The response should return the mappings in a format that can be easily used in a create index request.

:SearcMapping >feature Search

Most helpful comment

I wonder if this deserves a separate API. If you just want the merged mappings of all source indices, can't you get the mappings for each source index, and do a put mapping call on the target index for each of those source mappings (we already merge incoming mappings with existing mappings when doing the put mappings call). If there is a conflict, you will get an exception message saying so. I don't think this is a hard task and you don't need to know the inner workings of mappings for this.

All 5 comments

Pinging @elastic/es-search (:Search/Mapping)

Pinging @elastic/es-distributed (:Distributed/Reindex)

I wonder if this deserves a separate API. If you just want the merged mappings of all source indices, can't you get the mappings for each source index, and do a put mapping call on the target index for each of those source mappings (we already merge incoming mappings with existing mappings when doing the put mappings call). If there is a conflict, you will get an exception message saying so. I don't think this is a hard task and you don't need to know the inner workings of mappings for this.

I hadn't thought of this way of doing it. One of the disadvantages though is you have to create the index first. Which means in case of a conflict you're left with an index to clean up. It is also a lot of calls potentially for an index pattern that matches many indices.

Do we know why reindex API does not use this way to auto-create the destination index?

We discussed this in today's distributed sync. We think that such an API (to determine the best mapping of a target index that is the copy of one or more source indices) is a generally useful thing, not only for the reindex API that ES offers, but for any kind of reindex flow (or possibly a future multi-index shrink API). We can see reindex making use of this functionality / API, but don't think this is something that should be limited to that API. I'm therefore removing the reindex label, and letting the search team decide how to move forward on this one.

Was this page helpful?
0 / 5 - 0 ratings