Logstash should have more dynamic ways to lookup and enrich events, especially with external user-defined datasets. Currently, the main venue of lookup enrichment comes from the translate filter, which is primarily basic key/value lookup and only supports YAML. Here's some ideas:
Use RDBMS, Elasticsearch, or others as doc store for lookup dataset
logstash-filter-translate (Phase 1)
The lookup data should be cached:
CSV Format
code,status_description,status_type,color
200,OK,Successful,Green
201,Created,Successful,Green
202,Accepted,Successful,Green
300,Multiple Choices,Redirection,Yellow
Example
#Config
filter {
lookup_file {
path => "~/conf/lookup.csv"
format => "csv"
cache_size => "1MB"
refresh_interval => 10000
event_fields => ["http_code", "color"] # (required) 1+ event keys to match with. event_fields.length() == lookup_fields.length()
lookup_fields => ["code", "color"] # (required for csv) 1+ lookup keys to match against
target_fields => ["status_description", "status_type"] # (optional) whitelist of 1+ looked up fields to add to event. If not defined, adds all fields (not including lookup key fields e.g. "code" and "color") to event top level.
}
}
#Event in
Event {
http_code => "202"
color => "Green"
}
#Event out
Event {
http_code => "202"
color => "Green"
status_description => "Accepted"
status_type => "Successful"
}
JSON Format
{
"200": "Green",
"201": "Green",
"202": "Green",
"300": "Yellow",
"elastic": true,
"version": 5.0
}
YAML Format
200: âGreenâ
201: âGreenâ
202: âGreenâ
300: âYellowâ
elastic: true
version: 5.0
Example
#Config
filter {
lookup_file {
path => "~/conf/lookup.json"
format => ["json" | "yaml"]
cache_size => "1MB"
refresh_interval => 10000
event_fields => "key"
target_fields => "lookup_value" # (optional) new field name of looked up value. If not defined, new field name defaults to "lookup_value".
}
}
#Event in
Event {
key => "elastic"
product => "logstash"
}
#Event out
Event {
key => "elastic"
product => "logstash"
lookup_value => true
}
Very similar to file counterpart, except 'url' instead of 'path'.
filter {
lookup_http {
url => "localhost:9200/lookup1/"
tls => false
# other fields are the same...
}
}
Ref: https://github.com/elastic/logstash/issues/5087, https://github.com/elastic/logstash/issues/3633, https://github.com/elastic/logstash/issues/3446, https://github.com/elastic/logstash/issues/4510
P.S. - open to suggestions on new plugin names...~~
Could we use ES as a backend store for the lookup? just reread carefully.
format => ["json" | "yaml"] # This could be auto-detected by the file name or when reading?
@ph you're right, it could be and we should consider it when implementing.
@acchen97 I love the idea of enhanced lookups for logstash pipeline, what about pushing priority on redis lookup, specially when the lookup is dynamic, having lookup with both ES and Redis might be very helpful to enhance events at runtime. Specially when there are two flows that have connections somehow.
@ph also could be detected by the parser, filename might be tricky but I agree usually a .yml extension indicate a yaml file :-P
+1
Are there any timelines for this feature?
@vnadgir-ef file lookups are already supported by the translate filter (supports yaml, csv, and json format). There is no timeline. We have tentatively set this for Logstash 5.2.0 but do not have a release date (and Logstash 5.1.0 isn't out yet, either).
I've created a plugin that does a lot of what's requested here. I call the plugin logstash-filter-augment. It allows joining multiple fields from a CSV/JSON/YAML file onto an event. I based it initially on the translate filter.
The gem is published to ruby-gems: https://rubygems.org/gems/logstash-filter-augment
And it's public on github: https://github.com/alcanzar/logstash-filter-augment
I'd appreciate any feedback/bug fixes/enhancement requests.
@acchen97 can you update the description of this ticket (or close it and open a new one) to reflect some of the recent work in this area? I remember us having some discussions on slack/zoom about features we've already got implemented in the translate filter, for example.
@jordansissel updated this based on our most recent discussions. Let me know if I missed anything.
It would be nice to allow not only the elasticsearch _search endpoint, but also the _analyze endpoint as well.
I use the Translate filter heavily and the Ruby filter also for the same reasons so this is a very welcome addition.
In one case I am using the Translate filter to lookup certain values and if nothing matches I have the ruby filter execute a Go program that queries a HTTP api, returns the result and appends the results to the translate dictionary. The issue is that if there are for example 100 incoming messages with the same value that does not exist in the dictionary the HTTP api will be hit 100 times, if there would be some way to trigger a reload of the dictionary if the file changes then that would be extremely valuable.
Just instead of having a reload the file every X seconds have it watch the file for modifications and if it is changed reload it. To prevent constant reloads if the dictionary changes fast then have a setting to to wait at least X seconds before reloading it again.
@jordansissel @suyograo just updated this based on our recent discussions with specific action items for translate, elasticsearch, and jdbc filters. One thing we should discuss is the design for better integrating the ES filter with ES percolations.
Any news here or other issues to follow up the work?
Is this still a planned feature?
Database lookup enrichment is now generally available with the JDBC static and JDBC streaming filters.
Most helpful comment
I've created a plugin that does a lot of what's requested here. I call the plugin logstash-filter-augment. It allows joining multiple fields from a CSV/JSON/YAML file onto an event. I based it initially on the translate filter.
The gem is published to ruby-gems: https://rubygems.org/gems/logstash-filter-augment
And it's public on github: https://github.com/alcanzar/logstash-filter-augment
I'd appreciate any feedback/bug fixes/enhancement requests.