Beats: Map Files to Ingest Pipelines

Created on 8 Jul 2016 · 15Comments · Source: elastic/beats

Text copied from https://discuss.elastic.co/t/map-files-to-pipelines/55063

I'm playing around with the new Elastic5 products and am interested in using the ingest node feature with Filebeat as the agent. I have a server that runs several applications with different log files/formats and it seems that per Filebeat agent you can only define a single Elasticsearch output that references a single pipeline.

It would be nice if it was possible to define a mapping of log files to output pipelines instead of having to maintain one big pipeline containing all of the grok patterns. I feel it could get messy trying to keep track of all of the patterns and which application's logs they are matching.

Something obvious I am missing or will I have to maintain a large list of grok patterns in one pipeline?

Thanks

Filebeat Stalled enhancement needs_team

Source

sirstevepal

👍5

Most helpful comment

pipelines can be extracted directly extracted using format strings (but format strings can only access event fields, no processing to get the source basename).

For an example with conditionals see indices setting. Pipelines works the same, just replaces indices with pipelines and index with pipeline. Conditionals are the same as available for processors: https://www.elastic.co/guide/en/beats/filebeat/5.0/configuration-processors.html#filtering-condition

We've got the exact functionality for indices and pipelines in a few places, but docs are not fully up-to date yet (due to introducing all the duplications, docs might need some more restructuring.).

Alternatively, instead of relying on conditonals, one can define a prospector per source and use the fields setting to set a custom pipeline parameter per prospector. e.g.

filebeat.prospectors:
  - ...
    fields.pipeline: pipeline1
  - ...
    fields.pipeline: pipeline2

output.elasticsearch:
  pipeline: '%{[fields.pipeline]}'

urso on 11 Nov 2016

👍2

All 15 comments

I am second that, exactly for the reasons mentioned by @sirstevepal.

max0x7ba on 10 Nov 2016

I may be misunderstanding this issue but wouldn't it be possible to run several Filebeats (one per log file format) and that way making sure that each log file type is being sent to the right pipeline?

Shugyousha on 10 Nov 2016

@Shugyousha In my particular case the environment is resource-constrained and ideally I would like to use one filebeat process using only one CPU (a specific one, taskset) for ingesting all log files.

max0x7ba on 10 Nov 2016

We recently introduced format string support for pipelines. I assume this should solve this issue? https://www.elastic.co/guide/en/beats/filebeat/5.0/elasticsearch-output.html#_pipeline For detailed docs about format string check out the example for the index format string. @urso Perhaps you can add some more details on this?

ruflin on 11 Nov 2016

On Fri, Nov 11, 2016 at 12:28 PM, Nicolas Ruflin
[email protected] wrote:

We recently introduced format string support for pipelines. I assume this
should solve this issue?
https://www.elastic.co/guide/en/beats/filebeat/5.0/elasticsearch-output.html#_pipeline

I found this option as well when searching for a possible solution to
this issue but I wasn't sure how this would actually look in practice.
Having an example there would help a lot.

For detailed docs about format string check out the example for the index
format string.

As someone who hasn't worked with ElasticSearch too much yet it's not
clear which conditionals are meant in the description of the option
and how you would have to format the array to make these conditionals
apply to the right pipelines.

If you could send me a link to the pertaining configuration concept
(i.e. the conditionals) I can try to come up with an example for the
usage of this option for inclusion into the documentation.

Shugyousha on 11 Nov 2016

pipelines can be extracted directly extracted using format strings (but format strings can only access event fields, no processing to get the source basename).

We've got the exact functionality for indices and pipelines in a few places, but docs are not fully up-to date yet (due to introducing all the duplications, docs might need some more restructuring.).

Alternatively, instead of relying on conditonals, one can define a prospector per source and use the fields setting to set a custom pipeline parameter per prospector. e.g.

filebeat.prospectors:
  - ...
    fields.pipeline: pipeline1
  - ...
    fields.pipeline: pipeline2

output.elasticsearch:
  pipeline: '%{[fields.pipeline]}'

urso on 11 Nov 2016

👍2

@urso I just tested such a configuration and it works. Thanks a lot!

max0x7ba on 11 Nov 2016

One thing I noticed when using '%{[fields.pipeline]}' format string for the pipeline is that when field fields.pipeline is missing filebeat does not report any error and seems to upload and function just fine. However, I was not able to find the uploaded data in Elastic Search (may be I did not look hard enough).

max0x7ba on 11 Nov 2016

On Fri, Nov 11, 2016 at 1:48 PM, Steffen Siering
[email protected] wrote:

We've got the exact functionality for indices and pipelines in a few places,
but docs are not fully up-to date yet (due to introducing all the
duplications, docs might need some more restructuring.).

Alternatively, instead of relying on conditonals, one can define a
prospector per source and use the fields setting to set a custom pipeline
parameter per prospector. e.g.

filebeat.prospectors:

...
fields.pipeline: pipeline1

...
fields.pipeline: pipeline2

output.elasticsearch:
pipeline: '%{[fields.pipeline]}'

Thanks, that makes it a lot clearer!

Is there a way for people outside of the project to update the
documentation or send a PR for it?

Shugyousha on 14 Nov 2016

@Shugyousha Sure. All the docs are in the project specific "docs" directory. For "pipelines" this is for example here: https://github.com/elastic/beats/blob/98e1aef77be9a44e79e21563bddea0ccd2f94689/libbeat/docs/outputconfig.asciidoc

ruflin on 14 Nov 2016

@max0x7ba If not fields can be found, it falls back to pipeline. If there is no pipeline, no pipeline is used and it should be just added to the index.

ruflin on 14 Nov 2016

👍1

@ruflin That explains my observations, thank you.

max0x7ba on 15 Nov 2016

I opened a pull request for the documentation: https://github.com/elastic/beats/pull/3010

Feedback welcome!

Shugyousha on 15 Nov 2016

🎉1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.