Refference : https://github.com/elastic/elasticsearch/issues/13838
I was confirmed by Elasticsearch team that the issue is not with Elasticsearch.
Here is description and history:
Hello,
I have a problem with logstash creating same entry in DB 2 or more times.
Here is the search of the entry before it appears in DB:
root@elk:/home/tigran# curl -XGET localhost:9200/appdev-2015.09.29/jibeqa/_search -d '{"query":{"bool":{"must":[{"term":{"jibeqa.date":"2015-09-29T07:10:50+0000"}}],"must_not":[],"should":[]}},"from":0,"size":10,"sort":[],"facets":{}}' | json_pp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 270 100 122 100 148 19473 23623 --:--:-- --:--:-- --:--:-- 24666
{
"timed_out" : false,
"hits" : {
"total" : 0,
"hits" : [],
"max_score" : null
},
"took" : 1,
"_shards" : {
"successful" : 5,
"total" : 5,
"failed" : 0
}
}
root@elk:/home/tigran#
As you can see it is empty for now.
Now I send json to the log file which will be forwarded with logstash-forwarder
{"appId":"imas-dev-10.240.21.82","date":"2015-09-29T07:10:50+0000","category":"IMAS_CHATSESSION_MESSAGE_RECEIVED_121","user":"+11000040488","toUser":"+11000040466","messageType":"imdn","contributionId":"FACfIAV6DA","size":265,"extras":{}}
logstash rubydebug shows that it was parsed and sent to Elasticsearch just once:
{
"appId" => "imas-dev-10.240.21.82",
"date" => "2015-09-29T07:10:50+0000",
"category" => "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"user" => "+11000040488",
"toUser" => "+11000040466",
"messageType" => "imdn",
"contributionId" => "FACfIAV6DA",
"size" => 265,
"extras" => {},
"@version" => "1",
"@timestamp" => "2015-09-29T07:10:50.000Z",
"file" => "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"host" => "app3-us",
"offset" => "2390",
"tags" => "appdev",
"type" => "jibeqa"
}
But new search in Elasticsearch shows that it was created 2 times:
root@elk:/home/tigran# curl -XGET localhost:9200/appdev-2015.09.29/jibeqa/_search -d '{"query":{"bool":{"must":[{"term":{"jibeqa.date":"2015-09-29T07:10:50+0000"}}],"must_not":[],"should":[]}},"from":0,"size":10,"sort":[],"facets":{}}' | json_pp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1350 100 1202 100 148 3304 406 --:--:-- --:--:-- --:--:-- 3311
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"timed_out" : false,
"took" : 361,
"hits" : {
"total" : 2,
"max_score" : 1,
"hits" : [
{
"_index" : "appdev-2015.09.29",
"_score" : 1,
"_type" : "jibeqa",
"_id" : "AVAYqahQZ2RePRz37-kE",
"_source" : {
"appId" : "imas-dev-10.240.21.82",
"messageType" : "imdn",
"toUser" : "+11000040466",
"offset" : "2390",
"contributionId" : "FACfIAV6DA",
"@timestamp" : "2015-09-29T07:10:50.000Z",
"user" : "+11000040488",
"type" : "jibeqa",
"@version" : "1",
"date" : "2015-09-29T07:10:50+0000",
"file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"tags" : "appdev",
"host" : "app3-us",
"category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"size" : 265,
"extras" : {}
}
},
{
"_score" : 1,
"_type" : "jibeqa",
"_source" : {
"file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"date" : "2015-09-29T07:10:50+0000",
"@version" : "1",
"type" : "jibeqa",
"extras" : {},
"size" : 265,
"category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"tags" : "appdev",
"host" : "app3-us",
"messageType" : "imdn",
"appId" : "imas-dev-10.240.21.82",
"contributionId" : "FACfIAV6DA",
"@timestamp" : "2015-09-29T07:10:50.000Z",
"user" : "+11000040488",
"offset" : "2390",
"toUser" : "+11000040466"
},
"_id" : "AVAYqahQZ2RePRz37-kD",
"_index" : "appdev-2015.09.29"
}
]
}
}
The ID is different but the rest is same.
My input is:
lumberjack {
port => 5120
#type => "logs"
#tags => "appdev"
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder_appdev.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder_appdev.key"
codec => "json"
}
Filter is:
filter {
if [type] == "jibeqa" {
json{
source => "message"
}
mutate {
remove_field => "message"
}
date {
match => [ "date", "ISO8601"]
}
}
}
And the output is:
output {
if "appdev" in [tags] {
elasticsearch { host => localhost protocol => "http" cluster => Jibe_ELK index => "appdev-%{+YYYY.MM.dd}" workers => 5 }
}
# stdout { codec => rubydebug }
}
Versions:
ii elasticsearch 1.5.2 all Open Source, Distributed, RESTful Search Engine
ii logstash 1.4.5-1-a2bacae all An extensible logging pipeline
ii logstash-contrib 1.4.5-1-2bad350 all Community supported plugins for Logstash
Update:
I updated logstash to override the _id:
if "appdev" in [tags] {
elasticsearch { host => localhost protocol => "http" cluster => Jibe_ELK index => "appdev-%{+YYYY.MM.dd}" document_id => "%{contributionId}-%{offset}-%{date}" workers => 5 }
}
And logstash.log shows that logstash sends to elasticsearch only one document:
{
"appId" => "imas-dev-10.240.21.82",
"date" => "2015-09-29T07:10:52+0000",
"category" => "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"user" => "+11000040488",
"toUser" => "+11000040466",
"messageType" => "imdn",
"contributionId" => "FACfIAV6DA",
"size" => 265,
"extras" => {},
"@version" => "1",
"@timestamp" => "2015-09-29T07:10:52.000Z",
"file" => "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"host" => "app3-us",
"offset" => "2868",
"type" => "jibeqa",
"tags" => "appdev"
}
But in Elasticsearch it appears with _id I assigned and with default one:
{
"hits" : {
"max_score" : 1,
"total" : 2,
"hits" : [
{
"_source" : {
"offset" : "2868",
"date" : "2015-09-29T07:10:52+0000",
"messageType" : "imdn",
"user" : "+11000040488",
"contributionId" : "FACfIAV6DA",
"file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"host" : "app3-us",
"extras" : {},
"@version" : "1",
"toUser" : "+11000040466",
"category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"appId" : "imas-dev-10.240.21.82",
"type" : "jibeqa",
"size" : 265,
"@timestamp" : "2015-09-29T07:10:52.000Z",
"tags" : "appdev"
},
"_index" : "appdev-2015.09.29",
"_type" : "jibeqa",
"_id" : "AVAZT-DeZ2RePRz3MKsB",
"_score" : 1
},
{
"_source" : {
"appId" : "imas-dev-10.240.21.82",
"category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
"toUser" : "+11000040466",
"@version" : "1",
"tags" : "appdev",
"size" : 265,
"@timestamp" : "2015-09-29T07:10:52.000Z",
"type" : "jibeqa",
"user" : "+11000040488",
"messageType" : "imdn",
"date" : "2015-09-29T07:10:52+0000",
"offset" : "2868",
"extras" : {},
"host" : "app3-us",
"file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
"contributionId" : "FACfIAV6DA"
},
"_index" : "appdev-2015.09.29",
"_id" : "FACfIAV6DA-2868-2015-09-29T07:10:52+0000",
"_type" : "jibeqa",
"_score" : 1
}
]
},
"_shards" : {
"total" : 5,
"failed" : 0,
"successful" : 5
},
"timed_out" : false,
"took" : 3
}
Hi,
Anyone saw this case?
Did you find a resolution for this issue?
We are forwarding log-files for multiple applications on different machines. We've seen this issue now multiple times. Any solution? Can anyone tell me if this could also happen using a newer logstash-version?
Our setup, logstash: 1.5.5
Forwarder: 0.4.0
ES: 1.5
(I know it's EOL)
So it's still not solved?
I hope this could be useful stackoverflow
Logstash will concatenate the three config files together (s3 input, Http input, Filebeat) and see three output sections.
The three output sections are not related to the specific inputs - instead Logstash will send an input from any one of the three sources to all of the configured outputs. As a result your message will be output three times to the same destination.
I would create a separate, single output config file and remove the output section from your 3 input config files.
by Olly Cruickshank
We're Seeing this issue as well.
Filebeat --> Logstash --> Lumberjack --> ES
Seeing the document twice in ES.
Logstash cannot guarantee exactly-once delivery of messages, because in general it handles partial failure (such as unacknowledged failure) by retrying -- the other option in the face of partial failure is data loss, which we try to avoid.
It seems some attempts were made in the above report to provide consistent document IDs to enable Elasticsearch to deduplicate the events it receives, but debugging these types of setups is beyond the scope of our issue tracker and is better handled in the forums where our community of developers and users work together to help each other out.
I am closing out this issue, because it is not actionable.
Most helpful comment
Did you find a resolution for this issue?