Elasticsearch: [Ingest] Ingest Node cannot find pipeline that exists

Created on 17 Jan 2018 · 14Comments · Source: elastic/elasticsearch

I ran into an issue, which I cannot reproduce, where a set of ingest nodes were complaining that they could not find an ingest pipeline during an ordinary _bulk request that used it. This followed a single node cluster being upgraded from 6.1.1 to 6.1.2 by adding a new node, then decommissioning the old node. While trying to debug the issue, two other nodes were added, which were also ingest nodes.

I was able to see the Ingest pipeline by requesting it directly:

GET /_ingest/pipeline/xpack_monitoring_6

as well as via the cluster state:

GET /_cluster/state/metadata?filter_path=metadata.ingest.pipeline

In both cases, it returned the expected ingest pipeline:

{
  "xpack_monitoring_6": {
    "description": "This is a placeholder pipeline for Monitoring API version 6 so that future versions may fix breaking changes.",
    "version": 6010099,
    "processors": []
  }
}

Workaround

Fixing the issue was simple, but it's unclear what was actually wrong. Simply delete, then replace the exact same ingest pipeline with the same name (it's possible that the DELETE step is unnecessary):

DELETE /_ingest/pipeline/xpack_monitoring_6

PUT /_ingest/pipeline/xpack_monitoring_6
{
  "description": "This is a placeholder pipeline for Monitoring API version 6 so that future versions may fix breaking changes.",
  "version": 6010099,
  "processors": []
}

From that point onward it began to work.

:CorFeatureIngest >bug

Source

pickypg

All 14 comments

/cc @AlexP-Elastic

pickypg on 17 Jan 2018

Small update:

This happened again on the same cluster, which means that it had nothing to do with the upgrade. Also, the DELETE is not required; only the PUT of the ingest pipeline to replace it.

It _appears_ to be related to updating an unrelated ingest pipeline, but we have not yet figured out how to make it retrigger since doing it again (the person that did it was not trying to make it happen).

pickypg on 18 Jan 2018

@pickypg was ingest enabled on all the nodes?

talevy on 19 Jan 2018

All 3 nodes were ingest enabled, in addition: 2 of the nodes were data+master eligible, 1 of the nodes was master only (this is standard 2HA config for Cloud clusters)

AlexP-Elastic on 19 Jan 2018

👍1

strange. let me know if there is a way to recreate and I will see what is happening!

talevy on 19 Jan 2018

this just happened on another Elastic Cloud cluster. I believe https://github.com/elastic/elasticsearch/pull/28588 should fix this problem.

talevy on 15 Feb 2018

nevermind. the issue is that scripting is disabled and there are pipelines in the cluster-state that depend on scripts. The start-up exception is being muted. I feel like the node should not be able to start up in this case?

talevy on 15 Feb 2018

@talevy what scripting was disabled? We may have incorrect defaults set up in the EC(E) UI that are causing this

AlexP-Elastic on 15 Feb 2018

@AlexP-Elastic Monitoring's ingest pipeline leverages inline painless scripts

talevy on 15 Feb 2018

We confirmed that:

The logging and metrics cluster in ECE is brought up with inline scripting disabled, no idea why - I'll open an ECE ticket to investigate .. this was the cluster that @talevy and @kovyrin investigation
We just confirmed that the other (6.2.x) cluster had scripting enabled, so that is less clear
- (some of the time when we thought we were looking at the 6.x cluster we were accidentally looking at the logging and metrics cluster).
- Unless I see it again, I'll assume it was the transient issue that will be fixed by #28588, even though that's a bit of a suspicious coincidence

AlexP-Elastic on 15 Feb 2018

The start-up exception is being muted. I feel like the node should not be able to start up in this case?

@talevy Can you share the exception that is being swallowed here?

jasontedor on 16 Feb 2018

@jasontedor I will take a look at this today and share the update

talevy on 20 Feb 2018

@jasontedor update: the exception is not swallowed. I just thought that this was being applied when the NodeService is created, but it is only registered as a cluster-state applier and applies things after startup.

here is the stacktrace that should be seen on the cloud instances.

I don't see a way to restrict the node from starting up since this is occurring too late in the game. One thing that can be done is to only fail on the offending pipelines, and still support the others.

[2018-02-20T09:01:34,170][INFO ][o.e.n.Node               ] [node_t1] started
[2018-02-20T09:01:34,175][WARN ][o.e.c.s.ClusterApplierService] [node_t1] failed to notify ClusterStateApplier
org.elasticsearch.ElasticsearchParseException: Error updating pipeline with id [YLFyNXsg]
    at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:90) ~[main/:?]
    at org.elasticsearch.ingest.PipelineStore.applyClusterState(PipelineStore.java:69) ~[main/:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498) [main/:?]
    at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495) [main/:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482) [main/:?]
    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161) [main/:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [main/:?]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) [?:?]
    at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: cannot execute [inline] scripts
    at org.elasticsearch.ExceptionsHelper.convertToElastic(ExceptionsHelper.java:61) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException(ConfigurationUtils.java:293) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:400) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:361) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:306) ~[main/:?]
    at org.elasticsearch.ingest.Pipeline$Factory.create(Pipeline.java:122) ~[main/:?]
    at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:86) ~[main/:?]
    ... 13 more
Caused by: java.lang.IllegalArgumentException: cannot execute [inline] scripts
    at org.elasticsearch.script.ScriptService.compile(ScriptService.java:297) ~[main/:?]
    at org.elasticsearch.ingest.common.ScriptProcessor$Factory.create(ScriptProcessor.java:109) ~[main/:?]
    at org.elasticsearch.ingest.common.ScriptProcessor$Factory.create(ScriptProcessor.java:90) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:389) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:361) ~[main/:?]
    at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:306) ~[main/:?]
    at org.elasticsearch.ingest.Pipeline$Factory.create(Pipeline.java:122) ~[main/:?]
    at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:86) ~[main/:?]
    ... 13 more
[2018-02-20T09:01:34,185][INFO ][o.e.g.GatewayService     ] [node_t1] recovered [0] indices into cluster_state

talevy on 20 Feb 2018

I've opened #28752 to resolve this. feel free to comment. I understand that other forms of exceptions may be preferred and would be happy to hear others' thoughts.

talevy on 20 Feb 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings