Logstash: Support wildcards in xpack.management.pipeline.id

Created on 14 Mar 2019 · 18Comments · Source: elastic/logstash

Currently when Centralized Pipeline Management is enabled in Logstash, we need to set the xpack.management.pipeline.id with the ids of all pipelines that the Logstash node will execute.

This means that when we want to add a new pipeline, we need to update the Logstash node configurations with the id of the new pipeline.

This is not ideal in uses cases where the ELK cluster is shared by multiple tenants, that can create their own Logstash pipelines via Kibana.

To address this limitation, it would be nice if the xpack.management.pipeline.id supported wildcards when specifying the pipeline ids. Other approaches are possible, for instance, instead of having the xpack.management.pipeline.id configuration in Logstash configs, move it to the .logstash index in Elasticsearch.

enhancement management

Source

lmtjalves

👍25 🚀1

Most helpful comment

Hi,
I agree with others comments.
Having to restart all logstash processes each time you want to add a new pipeline is very time consuming.

I think we should have the possibility to give logstash istances one or multiple tags from the kibana webui (production, staging, etc...)
These tags must be also applied to pipelines so that each logstash instance knows which pipeline it has to execute
If we want some logstash instances to stop executing a pipeline, we could just delete the tag from the kibana webui

Hope this idea make sens

Best regards

jerome83136 on 17 Apr 2019

👍5

All 18 comments

:+1:

bphenriques on 14 Mar 2019

Was also looking for the same functionality,
it could be nice to either be able to use a prefix for all pipelines relevant for a logstash host (for instance, if we had pipelines called host1.source1 and host1.source2, it would help querying using host1*),
or even have a labels on the pipelines stored within elastic, that could be used to grab all pipelines tagged with the same label.

looking in the code and _.logstash_ index, the pipelines' names are only stored in the _id column,
so I wasn't able to query using a wildcard and suggest a quick PR.

there are a few solutions I could think of:

adding the pipeline's name as a keyword field, and then use it with wildcards.
adding a keyword field for labels, would require UI in kibana as well.
really ugly, but allow querying the description, in this case users could add some token to label pipelines.

assaf79 on 27 Mar 2019

Was looking at setting this up today, and was quite perplexed by this limitation. From a user perspective, I would expect I could add whatever I want - it seems very unintuitive to have a nice interface that requires a change and service restart to take advantage of every time you want a new pipeline.

Zeal0us on 3 Apr 2019

Hi,
I agree with others comments.
Having to restart all logstash processes each time you want to add a new pipeline is very time consuming.

Hope this idea make sens

Best regards

jerome83136 on 17 Apr 2019

👍5

Hi,

I agree with the above, and I actually, for some reason thought this was already possible. I hope very much this will be implemented.

Best regards

devtalk on 17 Jun 2019

Hi,

I agree with the above discussion, having to restart the logstash process everytime you add a new pipeline is something which needs manual intervention. Having to reload the new pipeline id the moment you deploy it in centralized UI is something what I was looking for.

Regards,

ivssh on 18 Jun 2019

I agree with all of the above comments. We're looking at splitting our processing into multiple pipelines and chaining them together to reduce the size of the configuration file. I would like to avoid having to restart logstash in order to define a new pipeline.

cpmoore on 9 Jul 2019

Hi,

Totally agree. Restarting Logstash every time is not a convenient way to add new pipelines...
Using wildcards or labels are good ideas.

👍

Regards.

maverick298 on 5 Aug 2019

We have a scenario where we have 2 logstash clusters (DMZ/internal) but 1 elasticsearch cluster. We were testing CPM with the ability to deploy certain pipelines to a specific cluster and had tried to do it via wildcards by prefixing DMZ pipelines (dmz-) and internal (internal-), then trying to use wildcards per logstash cluster in xpack.management.pipeline.id with no luck.

After coming across this I hope the capability comes soon :)

reighnman on 12 Oct 2019

Hello,

Any news about this ? :)

jerome83136 on 17 Oct 2019

This seems like a massive oversight in the implementation. According to the docs:

The pipeline management feature centralizes the creation and management of Logstash configuration pipelines in Kibana.

I don't believe this is an accurate statement, since creating a pipeline in Kibana doesn't bring it into existence on any Logstash nodes until you update xpack.management.pipeline.id in the config file and restart Logstash. If we need to roll out configuration changes to every Logstash server in order to create a new pipeline, CPM adds little to no value to the "create" operation of a pipeline (in fact, it makes it slightly more complex, since I need to manage file-based configuration as well as config stored in ES). It seems that based on the current implementation, CPM is only useful for updating existing pipeline definition and config on the fly.

For the time being, I'd suggest updating the docs to make them less misleading by making it clear that CPM doesn't allow you to fully create a Logstash pipeline (i.e. configuration updates and restarts of Logstash nodes are still required).

For me, the expected behavior of this feature would be something like this:

Logstash nodes are configured to turn on CPM with xpack.management settings. Within these settings you should be able to include a list of tags/groups/categories of pipelines that the Logstash node will pull down (e.g. dev-us-east, prod-us-west)
Create Logstash pipeline via Kibana (API or UI) and tag/group/categorize this pipeline to specify which Logstash nodes I wish to run this pipeline (e.g. dev-us-east)
When all the logstash nodes next poll for new pipelines, the new Logstash pipeline is started by the Logstash nodes matching the tags/groups/categories. This should not require a restart of Logstash.

Currently the process for creating a new pipeline with CPM looks like this from what I can see:

Logstash nodes are configured to turn on CPM with xpack.management settings with a hard-coded list of pipelines
Create Logstash pipeline via Kibana (API or UI)
Update filebeat.yml to include this new pipeline in the xpack.management.pipeline.id array
Roll out changes to Logstash nodes using configuration management and restart the Logstash node
Restarted nodes will pull config for the new pipeline from ES

For creating a pipeline, this is actually an additional step compared to simply managing all the pipeline config with a configuration management tool. It also means that you need to do the same steps for anything other than a pipeline definition update or a pipeline setting update, such as renaming the pipeline or deleting the pipeline.

Very disappointing.

percygrunwald on 18 Nov 2019

👍4

This seems like a massive oversight in the implementation. According to the docs:

The pipeline management feature centralizes the creation and management of Logstash configuration pipelines in Kibana.

I don't believe this is an accurate statement, since creating a pipeline in Kibana doesn't bring it into existence on any Logstash nodes until you update xpack.management.pipeline.id in the config file and restart Logstash. If we need to roll out configuration changes to every Logstash server in order to create a new pipeline, CPM adds little to no value to the "create" operation of a pipeline (in fact, it makes it slightly more complex, since I need to manage file-based configuration _as well as_ config stored in ES). It seems that based on the current implementation, CPM is only useful for updating existing pipeline definition and config on the fly.

For the time being, I'd suggest updating the docs to make them less misleading by making it clear that CPM doesn't allow you to fully create a Logstash pipeline (i.e. configuration updates and restarts of Logstash nodes are still required).

For me, the expected behavior of this feature would be something like this:

Logstash nodes are configured to turn on CPM with xpack.management settings. Within these settings you should be able to include a list of tags/groups/categories of pipelines that the Logstash node will pull down (e.g. dev-us-east, prod-us-west)

Create Logstash pipeline via Kibana (API or UI) and tag/group/categorize this pipeline to specify which Logstash nodes I wish to run this pipeline (e.g. dev-us-east)

When all the logstash nodes next poll for new pipelines, the new Logstash pipeline is started by the Logstash nodes matching the tags/groups/categories. This should not require a restart of Logstash.

Currently the process for creating a new pipeline with CPM looks like this from what I can see:

Logstash nodes are configured to turn on CPM with xpack.management settings with a hard-coded list of pipelines

Create Logstash pipeline via Kibana (API or UI)

Update filebeat.yml to include this new pipeline in the xpack.management.pipeline.id array

Roll out changes to Logstash nodes using configuration management and restart the Logstash node

Restarted nodes will pull config for the new pipeline from ES

For creating a pipeline, this is actually an additional step compared to simply managing all the pipeline config with a configuration management tool. It also means that you need to do the same steps for anything other than a pipeline definition update or a pipeline setting update, such as renaming the pipeline or deleting the pipeline.

Very disappointing.

100% agree on this. This seems like a proper thought process is needed. Deploying files each time seems so 1980's.

waynetaylor on 18 Nov 2019

👍2

Hello,

any update for this request ? The process for restart all logstash cluster every time we add a new pipeline is very limiting.

ktibi on 7 Jan 2020

Is this a significant level of effort for this change? Going on a year :(

reighnman on 7 Feb 2020

This would be very useful

lukeplausin on 19 Feb 2020

Still waiting on this. It defeats the purpose of centralized management when you have to restart every logstash instance to add a new pipeline.
Doesn't seem to be any activity from the elastic team.

cpmoore on 29 May 2020

👍2

I believe I'm going to try to implement this myself.
What is everyone's preference on using tags versus wildcard ids?

Looking at the source, currently Logstash uses a call to the /_mget endpoint to retrieve pipelines.
This endpoint does not support searching. It only returns documents requested by exact id.
Additionally the id of the pipeline is stored in the _id field in the .logstash index. Wildcards cannot be used to search this field, it only supports exact matches.

If we wanted to use wildcards in the xpack.management.pipeline.id list there are two options.

Retrieve every pipeline _id from Elasticsearch and filter it in the Logstash code using the xpack.management.pipeline.id list as regex patterns. Then a second request would be made to Elasticsearch to retrieve the matching documents. The documents would not be retrieved with the first query as this could be a very large list of documents. I do not believe most users will have thousands of pipelines created, but I've seen people do dumb things. Additionally unless multiple scroll requests are used, the number of pipelines returned will be limited to the index.max_result_window setting in Elasticsearch. I'm not sure if there is a current limit on the number of results that can be returned by the /_mget call.

Add a second id field to the pipeline documents in Elasticsearch. The documents will be search for using this new field instead. The issues regarding the max number of pipelines still exist, but this would eliminate the need to return every pipeline _id or filter them from the Logstash side. However, this requires additional updates to Kibana/Elasticsearch to ensure this field is added during pipeline creation and to ensure any previous pipeline documents are updated.

As for the tags approach, initially the tags would be set in the logstash.yml. There would be considerable more work in order for the Logstash tags to be managed in Kibana (since Kibana/Elasticsearch doesn't track Logstash servers beyond metrics).
With this option we have the option to leave the legacy setting in place so that tags may still be specified by exact id include pipelines with matching tags as well. This option would require updates to Kibana/Elasticsearch to allow adding tags to the pipeline_metadata field.

In my opinion, as long as there is not a large number of pipelines then wildcard option 1 is easiest to implement as it only requires changes to Logstash.

Thoughts?

cpmoore on 5 Jun 2020

This issue has now been fixed and is available in master and 7.x. This fix will be released as part of Logstash 7.11.