I ran into an issue, which I cannot reproduce, where a set of ingest nodes were complaining that they could not find an ingest pipeline during an ordinary _bulk request that used it. This followed a single node cluster being upgraded from 6.1.1 to 6.1.2 by adding a new node, then decommissioning the old node. While trying to debug the issue, two other nodes were added, which were also ingest nodes.
I was able to see the Ingest pipeline by requesting it directly:
GET /_ingest/pipeline/xpack_monitoring_6
as well as via the cluster state:
GET /_cluster/state/metadata?filter_path=metadata.ingest.pipeline
In both cases, it returned the expected ingest pipeline:
{
"xpack_monitoring_6": {
"description": "This is a placeholder pipeline for Monitoring API version 6 so that future versions may fix breaking changes.",
"version": 6010099,
"processors": []
}
}
Fixing the issue was simple, but it's unclear what was actually wrong. Simply delete, then replace the exact same ingest pipeline with the same name (it's possible that the DELETE step is unnecessary):
DELETE /_ingest/pipeline/xpack_monitoring_6
PUT /_ingest/pipeline/xpack_monitoring_6
{
"description": "This is a placeholder pipeline for Monitoring API version 6 so that future versions may fix breaking changes.",
"version": 6010099,
"processors": []
}
From that point onward it began to work.
/cc @AlexP-Elastic
Small update:
This happened again on the same cluster, which means that it had nothing to do with the upgrade. Also, the DELETE is not required; only the PUT of the ingest pipeline to replace it.
It _appears_ to be related to updating an unrelated ingest pipeline, but we have not yet figured out how to make it retrigger since doing it again (the person that did it was not trying to make it happen).
@pickypg was ingest enabled on all the nodes?
All 3 nodes were ingest enabled, in addition: 2 of the nodes were data+master eligible, 1 of the nodes was master only (this is standard 2HA config for Cloud clusters)
strange. let me know if there is a way to recreate and I will see what is happening!
this just happened on another Elastic Cloud cluster. I believe https://github.com/elastic/elasticsearch/pull/28588 should fix this problem.
nevermind. the issue is that scripting is disabled and there are pipelines in the cluster-state that depend on scripts. The start-up exception is being muted. I feel like the node should not be able to start up in this case?
@talevy what scripting was disabled? We may have incorrect defaults set up in the EC(E) UI that are causing this
@AlexP-Elastic Monitoring's ingest pipeline leverages inline painless scripts
We confirmed that:
The start-up exception is being muted. I feel like the node should not be able to start up in this case?
@talevy Can you share the exception that is being swallowed here?
@jasontedor I will take a look at this today and share the update
@jasontedor update: the exception is not swallowed. I just thought that this was being applied when the NodeService is created, but it is only registered as a cluster-state applier and applies things after startup.
here is the stacktrace that should be seen on the cloud instances.
I don't see a way to restrict the node from starting up since this is occurring too late in the game. One thing that can be done is to only fail on the offending pipelines, and still support the others.
[2018-02-20T09:01:34,170][INFO ][o.e.n.Node ] [node_t1] started
[2018-02-20T09:01:34,175][WARN ][o.e.c.s.ClusterApplierService] [node_t1] failed to notify ClusterStateApplier
org.elasticsearch.ElasticsearchParseException: Error updating pipeline with id [YLFyNXsg]
at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:90) ~[main/:?]
at org.elasticsearch.ingest.PipelineStore.applyClusterState(PipelineStore.java:69) ~[main/:?]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498) [main/:?]
at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495) [main/:?]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482) [main/:?]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161) [main/:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [main/:?]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: cannot execute [inline] scripts
at org.elasticsearch.ExceptionsHelper.convertToElastic(ExceptionsHelper.java:61) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException(ConfigurationUtils.java:293) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:400) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:361) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:306) ~[main/:?]
at org.elasticsearch.ingest.Pipeline$Factory.create(Pipeline.java:122) ~[main/:?]
at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:86) ~[main/:?]
... 13 more
Caused by: java.lang.IllegalArgumentException: cannot execute [inline] scripts
at org.elasticsearch.script.ScriptService.compile(ScriptService.java:297) ~[main/:?]
at org.elasticsearch.ingest.common.ScriptProcessor$Factory.create(ScriptProcessor.java:109) ~[main/:?]
at org.elasticsearch.ingest.common.ScriptProcessor$Factory.create(ScriptProcessor.java:90) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:389) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:361) ~[main/:?]
at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:306) ~[main/:?]
at org.elasticsearch.ingest.Pipeline$Factory.create(Pipeline.java:122) ~[main/:?]
at org.elasticsearch.ingest.PipelineStore.innerUpdatePipelines(PipelineStore.java:86) ~[main/:?]
... 13 more
[2018-02-20T09:01:34,185][INFO ][o.e.g.GatewayService ] [node_t1] recovered [0] indices into cluster_state
I've opened #28752 to resolve this. feel free to comment. I understand that other forms of exceptions may be preferred and would be happy to hear others' thoughts.