Elasticsearch: Custom analyzer filter scope

Created on 14 Sep 2016 · 30Comments · Source: elastic/elasticsearch

Elasticsearch version: 2.4.0

Plugins installed: [kopf]

JVM version: 1.8.0_60

OS version: OSX 10.11.6 locally, reproducible on CentOS 7.2.1511 as well

Description of the problem including expected versus actual behavior:

After upgrading to 2.4.0 our templates are no longer functioning from 2.3.3.

We have our template split up into several templates so we can specify default analyzers per language, and we share analyzers/filters/tokenizers between the templates.

Steps to reproduce:

Push a shared template with common analyzers/filters

POST /_template/analyzers
{
  "template": "*",
  "order": 20,
  "settings": {
    "analysis": {
      "analyzer": {
        "untouched_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [ "max_length", "lowercase" ]
        },
        "no_stopwords_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "max_length", "standard", "lowercase" ]
        }
      },
      "tokenizer": {
        "standard": {
          "type": "standard",
          "version": "4.4"
        }
      },
      "filter": {
        "max_length": {
          "type": "length",
          "max": "32766"
        }
      }
    }
  }
}

Push a language specific analyzer template, which is a czech analyzer referencing a filter from the analyzers template.

POST /_template/analyzer-cs
{
  "template": "*-cs",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }
    }
  }
}

Push another language specific analyzer template which uses a custom analyzer, referencing some filters from the shared analyzers template.

POST /_template/analyzer-en
{
  "template": "*-en",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "max_length", "standard", "lowercase", "stop_en" ]
        }
      },
      "filter": {
        "stop_en": {
          "type": "stop",
          "stopwords": "_english_",
          "ignore_case": "true"
        }
      }
    }
  }
}

This action fails, resulting in:

2016-09-13 09:52:00,174][DEBUG][action.admin.indices.template.put] [Zartra] failed to put template [analyzers-en]
[kOccXLKbR5CwAsXCt0v_3w] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [default] failed to find filter under name [max_length]];

However, as you see the first analyzer-cs was able to reference the max_length filter without issue.

These templates worked fine in 2.3.3 and 2.3.4, haven't tried 2.3.5 but it sounds like there weren't any changes to that release. Reading over release notes for 2.4.0 don't seem to indicate anything that would prevent this functionality from being supported.

We were seeing similar issues with referencing the shared analyzers in our mapping template file.

Provide logs (if relevant):

    2016-09-13 09:52:00,174][DEBUG][action.admin.indices.template.put] [Zartra] failed to put template [analyzers-en]
    [kOccXLKbR5CwAsXCt0v_3w] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [default] failed to find filter under name [max_length]];
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.validateAndAddTemplate(MetaDataIndexTemplateService.java:196)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.access$200(MetaDataIndexTemplateService.java:57)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService$2.execute(MetaDataIndexTemplateService.java:157)
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
        at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.IllegalArgumentException: Custom Analyzer [default] failed to find filter under name [max_length]
        at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:76)
        at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:216)
        at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:70)
        at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:879)
        at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
        at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
        at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:154)
        at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
        ... 11 more

:CorFeatureIndices APIs >bug

Source

mbarker

Most helpful comment

As I explained in #20919, this is a design error: templates should be validated only as much as any _incomplete_ object can be validated. Elasticsearch shoud reject a template only if it contains plain inconsistencies or grammar violations.

On top of that, this is a breaking change which you failed to document!

I have a patch that fixes the too hasty conclusion that a referred analyzer should have been declared in the same template. I consider it elegant enough for production, but I didn't bother with updating the tests.
If anyone is interested, I'm willing to publish it.

acarstoiu on 14 Oct 2016

👍2

All 30 comments

@johtani Could you take a look at this please?

clintongormley on 14 Sep 2016

Hi @mbarker

Before 2.4, we didn't validate any template on index template creation.
We added validation on index template creation in this PR.
In this PR, we validate each template as a complete mapping and settings,
we don't check combination because we cannot check all combination when template create.
Now, you should add the max_length filter setting each template for avoiding error.

@clintongormley Should we support to validate any template with only * template?
I think we can not predict templates combination except * template.

johtani on 16 Sep 2016

@johtani It seems to me like validating a template should be scoped to the context it should have available during indexing, unless I'm misunderstanding how ES supports multiple templates.

We certainly could replicate all of the analyzers/filters to all the templates where they are used, I was trying to avoid duplicating parts of the template.

If it would help I can give more details about how we are trying to compose the templates and the rationale behind it.

mbarker on 16 Sep 2016

Thanks. I will think what the validation is more useful for users.

Validating template on index template creation is helpful for many users because users know the error earlier than validating creating index, especially logging use-case.
However in your situation, this validation is not useful because template is bigger than before 2.4.

e.g. If there is skip validation flag and simulate template with index name without creating index, is it useful?

johtani on 16 Sep 2016

There is something else going on here. I (like @mbarker ) didn't understand why this template succeeded:

POST /_template/analyzer-cs
{
  "template": "*-cs",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }
    }
  }
}

while this template didn't:

POST /_template/analyzer-en
{
  "template": "*-en",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "max_length"
          ]
        }
      }
    }
  }
}

The reason is that the first analyzer definition is invalid, but is not validated:

      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }

You're using the czech analyzer and passing in a parameter filter, which the czech analyzer doesn't recognise but silently ignores. We should add validation to analyzers to complain about any left over parameters.

clintongormley on 19 Sep 2016

If there is skip validation flag and simulate template with index name without creating index, is it useful?

I like the idea of a skip evaluation flag. Not sure the simulate bit is needed - easy to test by just creating an index.

clintongormley on 19 Sep 2016

Or elasticsearch can create template always and if a template has some errors elasticsearch only return warning message. what do you think? @clintongormley

johtani on 21 Sep 2016

I'm wondering how hard it would be to figure the parent templates of a given template (because the wildcard is more general) and apply them as well at validation time.

jpountz on 21 Sep 2016

❤2

You could always make up name for the index to validate based on the template matching, ie replace * with test and resolve dependencies that way.

The only caveat would be users have to upload templates in dependency order.

mbarker on 21 Sep 2016

@jpountz @mbarker Thanks for your comment. Ah, you are right. I try to fix it.

johtani on 21 Sep 2016

If templates are validated using their dependencies when creating or updating, then what will happen when a template's dependency is deleted or updated? For example template A depends on B:

template B created
template A created
template B deleted
?

Heres another thought: what if a template is invalid by itself, but the index is created with explicit settings that the template depends on?

Another possible issue: What if a template overrides something in a another template that results in a final mapping that is invalid?

qwerty4030 on 21 Sep 2016

@qwerty4030 Thanks for your comment!

In your 1st case, I think it is OK that elasticsearch does not validate at template deletion time.
We get the error only when user creates index using template B.

In your 3rd case, we can validate overriding only the parent templates.

I' not sure your 2nd case. Could you explain any examples?

johtani on 22 Sep 2016

Just tried this in 2.3.2:

create template that is invalid by itself (references custom_analyzer that is not defined).
attempt to create index that matches this template (fails with mapper_parsing_exception as expected).
attempt to create index that matches this template and specifies the custom_analyzer in the settings (this works as expected).

To reproduce:

PUT /_template/test-template
{
  "template": "data-test-*",
  "order": 0,
  "mappings": {
    "test-type": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "custom_analyzer"
        }
      }
    }
  }
}

PUT data-test-1

PUT data-test-1
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}

qwerty4030 on 22 Sep 2016

Thanks. I think Elasticsearch should return the error in your cases, because that is tricky settings.

johtani on 2 Oct 2016

On top of that, this is a breaking change which you failed to document!

acarstoiu on 14 Oct 2016

👍2

The code is in this branch. GitHub doesn't let me to submit a pull request against a tag, neither from a particular commit. :-1:

acarstoiu on 14 Oct 2016

I'm wondering how hard it would be to figure the parent templates of a given template (because the wildcard is more general) and apply them as well at validation time.

It has occurred to me that we can never do this absolutely correctly. eg a template for logs-* may coincide with a template for *-2016-* or it may not. We can't tell without an index name.

I'm wondering if, instead of trying to be clever, we should just allow a flag to disable template validation and template PUT time.

The code is in this branch. GitHub doesn't let me to submit a pull request against a tag, neither from a particular commit. 👎

@acarstoiu there's a good reason for that... we're not planning on making a release from a patch applied to a tag.

clintongormley on 17 Oct 2016

@clintongormley let's try to focus on the matter, as I'm not interested in your SCM rules and I'm sure you can cherry-pick the fixing commit anytime you want. I merely justified the lack of a proper pull request.

I do not expect the commit to be accepted by itself, simply because it lacks the tests counterpart. I also did my best to preserve the coding style and the existing behaviour, but might not fit your taste. Other than that, it is a _good starting point_, we use it in production.

On the other hand, what you should do _urgently_ is to document this truely breaking change. Thank you.

acarstoiu on 17 Oct 2016

👍1

@acarstoiu you don't seem to get how the open source world works. Frankly, your attitude makes me disinclined to converse with you further.

clintongormley on 17 Oct 2016

👎3 ❤1

@clintongormley you do that, I agree.
As for the open source world, I already did two things: served this project a patch and urged repeatedly its team to publish this breaking change here for the benefit of _other users_.

And by the way, you can find in there lots of deprecations under the "breaking change" label, but when a true one is discovered, you keep it hidden.

acarstoiu on 20 Oct 2016

@clintongormley I for one agree with @acarstoiu and frankly, your dismissal of this _extremely inconvenient non-documented breaking change that has completely affected our cluster and must now be downgraded across the board_ is very concerning in the least. You are literally costing us time and money.

you don't seem to get how the open source world works.

You don't seem to understand how businesses work and seem to have very little regard as to the users of your product when you inexplicably hide information and then dismiss users that hit the issue because you don't like their attitude when they're justifiably irritated?

Isn't one of the tenets of open source software transparency? If so, you've missed the mark on this one.

mboudreau on 2 Nov 2016

@clintongormley as mentioned in https://github.com/elastic/elasticsearch/issues/21105#issuecomment-263380122 I think this should be noted in the 2.4 migration docs as a breaking change. Took us a while to track down this issue while attempting to migrate our 2.3.x cluster.

marshall007 on 28 Nov 2016

@marshall007 Yes I agree - I've just been rather put off working on this issue by the comments of others. If you'd like to submit a PR adding the breaking changes docs, I'd be happy to merge it.

clintongormley on 29 Nov 2016

@clintongormley sure thing! I have it up in PR #21869.

marshall007 on 29 Nov 2016

@clintongormley sure thing! I have it up in PR #21869.

thanks @marshall007 - merged

clintongormley on 30 Nov 2016

🎉1

@clintongormley @marshall007 is that PR supposed to update this page?
I'm getting a 404 for that URL. However 2.3 breaking changes URL works fine: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/breaking-changes-2.3.html

Thanks!

qwerty4030 on 5 Dec 2016

Thanks @qwerty4030 - that page wasn't being included in the docs. Should be fixed now: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/breaking-changes-2.4.html

clintongormley on 7 Dec 2016

@clintongormley that page still appears to be out of date, FYI. Looks like it hasn't been updated since https://github.com/elastic/elasticsearch/commit/537f4e1932b4d516cec8dcab1c942d3f31177dac.

marshall007 on 7 Dec 2016

@marshall007 it's here: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/_index_templates.html

clintongormley on 12 Dec 2016

Closing in favour of #21105

clintongormley on 12 Jan 2017

Was this page helpful?

0 / 5 - 0 ratings