Elasticsearch: Rethink template inheritance to allow for better validation

Created on 25 Oct 2016 · 23Comments · Source: elastic/elasticsearch

Validation of index templates has proven difficult to get right due to how multiple templates that might be invalid on their own can result in a valid template in the end, for instance if two templates match a given index name and one of them is providing analysis index settings while the other one is providing mappings that make use of these analyzers. (#20479)

The fact that we cannot validate index templates is not good, so I think we need to think about how we could validate them all the time without false positives. We were discussing this issue yesterday and the following options have been suggested:

Require that templates are valid on their own. In that case we might also want to remove the ability for several templates to match an index, as several templates could be merged into an invalid one (#20988).
Make the inheritance of templates explicit: instead of merging templates together based on order, a template could refer to other templates that should be included in order to not have to repeat the same bits over multiple templates. And again, we would probably only allow one template to be applied per index.

:CorFeatureIndices APIs >enhancement CorFeatures

Source

jpountz

👍4

Most helpful comment

We've started to migrate away from nested templates, but we're finding it really hard to get away from a default level 0 template that contains our dynamic mapping settings. If you guys do remove template inheritance, would there be any way to add an option to templates to import one other template that contains common dynamic mapping config? It would not be too hard for us to be explicit about the fields we expect in a template.... but we really don't want to have to repeat the dynamic mapping rules in 100 different places if we can avoid it.

neuroticnetworks on 1 Aug 2017

👍3

All 23 comments

Make the inheritance of templates explicit

Definitely a good idea IMO. In such a case, you know exactly what you are doing as a end user.

dadoonet on 25 Oct 2016

👍2

I wonder if we really need to make it all that sophisticated. It might be enough for 99% of the usecases to just have a single _root_ template that you can inherit from and that's it. This means that the root is applied as part of each individual template independent of what other templates match for a specific index. When users update a specific template we can validate all effective template associated with each other. I also think we might be able to make this happen for 5.1?

s1monw on 25 Oct 2016

I just discussed it with Simon since I was not clear how this would help. Since the perceived benefit of having multiple matching templates is to save some typing when putting them, the _root_ template would be applied when the template is put (rather than at index creation time) so that the template could be validated as it would be expected to be complete.

jpountz on 25 Oct 2016

What if the _root_ template also has it's own _root_?! The inheritance chain must be _free_ from any constraints, except for non-circularity, of course. ;)

A second question is how on Earth are you going to enforce the single association to a brand new index that just happens to be created by a document insertion request? You have to resort to limitations on the declared template patterns like "only non-overlapping prefixes" and such. I personally don't like artificial limits.
Another idea is to still keep the order field in templates, only that it will indicate the priority for application on a new index. The highest priority of the matching templates (matched through _free_ patterns) points to the winning template - this would be in line with what happens right now (the last merged template has the highest order).

acarstoiu on 28 Oct 2016

We just ran into this trying to upgrade our cluster to 2.4. We essentially have one "root" template containing all our custom filters and anaylzers that our other index templates inherit from. Would definitely be in favor of explicitly declaring template dependencies (from @jpountz initial comment). I think a single _root_ would be too restricting for our use case. We tend to have a single "root" template per "index type".

Probably worth mentioning this as a breaking change in the 2.4 migration docs, btw.

marshall007 on 28 Nov 2016

I think explicit template dependencies could work. Requiring templates to be valid on their own would be a little difficult since we have a somewhat complex index/template strategy. We recently upgrade from ES 0.90 to 2.3.x and refactored our massive single template into many much smaller templates where we rely on inheritance to keep them concise and re-use shared parts. With 2.3.x we had to partition our indices by language, so we ended up with templates with specific responsibilities.

Settings
We have cluster specific defaults (# shards/replicas, allocation settings, etc), this matches *. Then we have overrides for some subsets of index languages, matching *-en for example, changes the # shards for example.

Analyzers/Filters
We have a set of custom analyzers/filters that are common across all of our mappings (again matching *), and then we override the default analyzer for each language index (again matching *-en for example).

Mappings
We broke down our mappings into multiple templates for the top level parts of our documents, so they are all matching * and they just define the document mapping and reference analyzers from the previous section.

Here's a simplified example of the templates: https://gist.github.com/mbarker/40bd82de268cd5d73d3788621c4b80df

mbarker on 4 Jan 2017

(subscribing)

synhershko on 12 Jan 2017

@s1monw I'm not sure that a single root dependency would work either. Like @mbarker we have a relatively complex index/template strategy that works pretty well for us and we'd like to keep it. In particular we tend to define global settings like compression and whether or not _all is enabled at a default template with order 0. Our shared mappings (which correspond to what I think you have in mind) are usually defined at order 10, and this is where we put all the fields in common, default number of shards, etc for indices of a certain type. There are usually 3-4 of these shared mappings indices on any given cluster. Then, we have one or two additional templates at order 20 (and sometimes 30 as well) that specify the individual mappings and analyzers unique to a certain index. There are usually more like a dozen of these on any given cluster.

Inheritance is a certainly a feature we like and use regularly (albeit we've definitely been hit with a few gotchas, eg https://github.com/elastic/elasticsearch/issues/20479) but I also agree that the explicit dependencies would be very easy to work with (and arguably a lot easier to write tests against too, which we also do). @acarstoiu brings up a good point about nested inheritance. For what it's worth, it would not be difficult at all for us to work _without_ nested inheritance.

neuroticnetworks on 8 Feb 2017

We had a long discussion about this issue yesterday. We want to be able to validate index templates at PUT time in all cases, yet template inheritance makes it hard. For instance if a root template is modified, all child templates would need to be validated again. Furthermore, clusters are expected to have a reasonable total number of templates, which means having some duplicate sections in multiple templates is not that much of an issue. In conclusion we decided to remove support for merging templates entirely in 6.0 and use the order parameter in order to know _which_ template to use rather than how to merge multiple templates.

jpountz on 12 Mar 2017

👎2

If you:

enforce a _single inheritance_ model among templates (with some new property)
choose a single template to apply when a new index gets created (by following a suggestion I made 4.5 months ago 👍)

what is the problem in validating a (sub)tree of templates when a parent template is changed?

You're sparring the machine from doing more computations at the expense of more manual work necessary to multiply _and_ keep in sync the common parts - which is itself error-prone.
Is this your understanding of software evolution, folks?! Are you _all_ really sure that in the newest Elasticseach version you want to remove a useful functionality (as proven by many comments here and in related issues) in order to give _more work to humans_?
@kimchy Help, please! :)

acarstoiu on 13 Mar 2017

👍2

@jpountz would explicit inheritance simplify the validation at all?

If not, I think it will be important to write a tool that generates explicit templates outside of ES (for example, in Bash/Python/Go/whatever) by using a hierarchy/import statements/something. I'd be happy to help out with said tool since, without inheritance, it's something a lot of us are going to need, myself included.

One clarification if you don't mind: "clusters are expected to have a reasonable total number of templates, which means having some duplicate sections in multiple templates is not that much of an issue." Do you have an upper bound on what you'd consider reasonable? Is this expectation documented?

My experience and yours seem quite different. I do expect this to be an issue for quite a few people. I actually do agree that there are good reasons to make the change you have in mind, and ways to work around it (additional tooling, mentioned above). Among the reasons this change makes sense, though, I'm personally not convinced "It shouldn't be much of an issue" is one of them.

neuroticnetworks on 13 Mar 2017

@jpountz is just a messenger here, so there's no reason to shoot him. Yet, at least 😆

acarstoiu on 14 Mar 2017

Another possibility that has been put forward for discussion is to specify the template name in an indexing request.

/cc @uboness

clintongormley on 24 May 2017

👎2 👍1

That's a no go - it would add an extra thing to worry about in each and every indexing query.

acarstoiu on 27 May 2017

Yeah, that's not ideal. However, if that were optional that would be pretty interesting.

Battleroid on 27 May 2017

Templates could also be validated/converted at upgrade time if we don't have to worry about merging. A migration plugin could also provide early warning about invalid templates ahead of time before an upgrade.

geekpete on 14 Jul 2017

I don't see this in the Breaking Changes for 6.0 yet.
Is removing Index Template merging still the plan?

My company is still working on upgrading from 1.7 to 2.4.
My team in particular makes extensive use of template composition to support combinations of index types, 30+ languages, and field sets from 200+ vendors. So https://github.com/elastic/elasticsearch/issues/20479 is a pretty big show-stopper for us.

It seems like the simplest approach for us is to copy ES's merge logic into a custom tool (already done) and generate fully-qualified templates.
This works, but it seems sub-optimal from a product support standpoint.
Until someone open-sources such a tool (I don't yet know if I will be allowed to), every consumer needs to write their own version, which seems about as error-prone as copy-pasting analyzers across hundreds of index templates.

If Index Templates really aren't going to be modular go forward, would it be possible to release an official tool that encapsulates the merge logic? Or perhaps a helper endpoint that we can send a list of templates and get back the combined version? I don't have any concrete API suggestions, but I'd be happy to discuss if there's interest.

PhilMarsh on 31 Jul 2017

Template inheritance allows us to add aliasses to for example the default Beats index templates, like this:

PUT _template/filebeat-linux 
{
  "order": 1,
  "template": "filebeat-linux-*",
  "aliases" : {
    "a-filebeat-linux" : {}
  }
}

Please don't deprecate template inheritance completely or else we will have to manually update each beats template after each upgrade.

willemdh on 1 Aug 2017

👍3

neuroticnetworks on 1 Aug 2017

👍3

For those who want to migrate to _5.5.1_, I have ported the patch I made for _2.4.1_ (see this comment and the next one) in order to make the template validation lenient with respect to analyzers.

The port is here.

acarstoiu on 7 Aug 2017

It seems unreasonable to take away convenience and DRY in the name of dubiously-useful index template validation. The end result is that I will have to code the template merging myself.