Ability to shunt poisoned or unsuccessful events in the running pipeline to a new destination for further processing. This feature allows the existing pipeline to continue processing events without getting stuck because of a bad event.
Goals from the producer side, which is a Logstash instance creating dead letter queue events.
Once the DLQ is filled with events, users should be able to query, analyze and process events. What a user does with the dead event depends on the original error. For example, a mapping error in ES would create a DLQ entry. The user can then choose to remove the field causing the mapping issue and re-index the _cleaned_ event to ES.
feature.dead_letter_queue: true
path.dead_letter_queue: /path/to/dlq
input {
dlq {
path => /path/to/dlq
timestamp => "2017-04-04T23:40:37"
}
}
filter {}
output {
elasticsearch {}
}
Details here: https://github.com/logstash-plugins/logstash-input-dead_letter_queue/blob/master/lib/logstash/inputs/dead_letter_queue.rb#L5
dead_letter_queue.size.threshold: 24hpath.dlq
|
pipeline_id
|
outputs_elasticsearch
inputs_kafka
where outputs_elasticsearch contains events DLQ'd from ES outputs from the original production LS.
/cc @acchen97 @jordansissel @colinsurprenant
Looks good @suyograo thanks for writing this up. Few comments below:
DLQ has an inbuilt rotation policy to manage its file size. Without rotation, file size can get too big and would need manual curation. Once the threshold has been reached, the DLQ would behave like a ring buffer. Older events would be overwritten to accommodate new events.
I see one new setting for this and wondering if we need two. One for when the file rotates (e.g. every day), and the other maybe a max disk size allocated setting (e.g. 16gb) for DLQ that would execute the ring buffer behavior when hit? This would help mitigate out of disk space scenarios.
Plugins should be able to publish dead events to separate "topics/tags". This would allow for easy consumption later on, based on filtering by the type of plugin. Each plugin in Logstash will create a separate dead letter entry grouped by plugin name and stage. This physical separation on the file system allows for faster query/filtering when processing events from DLQ. Sequential access (having all events in one single file) would work but will make processing slower on the consumer side. Rotation threshold is applied per tag, not globally.
I'm glad we're able to separate these by topics. Segregating them by stage and plugin name I think is a good approach, and will provide enough granularity for easier reprocessing. Can we also enumerate what metadata we're storing in the DLQ? i.e. timestamp, error condition, etc.
@acchen97 Why would we rotate files by time? This sounds like something a user would probably want to configure, and I wonder if we even want to have this feature? I was hoping we could have something simpler like "The DLQ should only keep as much data as satisfies {retention policy}" where the retention policy can be something based on size of the dlq, age of things in the queue, etc. Specifics of "rotate files" are into the weeds as far as what kind of things I want users to concern themselves with.
@jordansissel since we can query by tags now, abstracting the rotation strategy from users would be ok.
In terms of the retention policy, we should have the option of constraining by both time and/or disk size. We can put the disk usage threshold as a roadmap item if we think it's out of scope of phase 1.
First version of DLQ was merged into master/5.x(5.5)
reference PR that added the API: https://github.com/elastic/logstash/pull/6894
Most helpful comment
First version of DLQ was merged into master/5.x(5.5)
reference PR that added the API: https://github.com/elastic/logstash/pull/6894