Logstash: Logstash creates same log entry multiple times

Created on 29 Sep 2015  路  7Comments  路  Source: elastic/logstash

Refference : https://github.com/elastic/elasticsearch/issues/13838

I was confirmed by Elasticsearch team that the issue is not with Elasticsearch.
Here is description and history:

Hello,

I have a problem with logstash creating same entry in DB 2 or more times.

Here is the search of the entry before it appears in DB:

root@elk:/home/tigran# curl -XGET localhost:9200/appdev-2015.09.29/jibeqa/_search -d '{"query":{"bool":{"must":[{"term":{"jibeqa.date":"2015-09-29T07:10:50+0000"}}],"must_not":[],"should":[]}},"from":0,"size":10,"sort":[],"facets":{}}' | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   270  100   122  100   148  19473  23623 --:--:-- --:--:-- --:--:-- 24666
{
   "timed_out" : false,
   "hits" : {
      "total" : 0,
      "hits" : [],
      "max_score" : null
   },
   "took" : 1,
   "_shards" : {
      "successful" : 5,
      "total" : 5,
      "failed" : 0
   }
}
root@elk:/home/tigran#

As you can see it is empty for now.

Now I send json to the log file which will be forwarded with logstash-forwarder

{"appId":"imas-dev-10.240.21.82","date":"2015-09-29T07:10:50+0000","category":"IMAS_CHATSESSION_MESSAGE_RECEIVED_121","user":"+11000040488","toUser":"+11000040466","messageType":"imdn","contributionId":"FACfIAV6DA","size":265,"extras":{}}

logstash rubydebug shows that it was parsed and sent to Elasticsearch just once:

{
             "appId" => "imas-dev-10.240.21.82",
              "date" => "2015-09-29T07:10:50+0000",
          "category" => "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
              "user" => "+11000040488",
            "toUser" => "+11000040466",
       "messageType" => "imdn",
    "contributionId" => "FACfIAV6DA",
              "size" => 265,
            "extras" => {},
          "@version" => "1",
        "@timestamp" => "2015-09-29T07:10:50.000Z",
              "file" => "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
              "host" => "app3-us",
            "offset" => "2390",
              "tags" => "appdev",
              "type" => "jibeqa"
}

But new search in Elasticsearch shows that it was created 2 times:

root@elk:/home/tigran# curl -XGET localhost:9200/appdev-2015.09.29/jibeqa/_search -d '{"query":{"bool":{"must":[{"term":{"jibeqa.date":"2015-09-29T07:10:50+0000"}}],"must_not":[],"should":[]}},"from":0,"size":10,"sort":[],"facets":{}}' | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1350  100  1202  100   148   3304    406 --:--:-- --:--:-- --:--:--  3311
{
   "_shards" : {
      "total" : 5,
      "successful" : 5,
      "failed" : 0
   },
   "timed_out" : false,
   "took" : 361,
   "hits" : {
      "total" : 2,
      "max_score" : 1,
      "hits" : [
         {
            "_index" : "appdev-2015.09.29",
            "_score" : 1,
            "_type" : "jibeqa",
            "_id" : "AVAYqahQZ2RePRz37-kE",
            "_source" : {
               "appId" : "imas-dev-10.240.21.82",
               "messageType" : "imdn",
               "toUser" : "+11000040466",
               "offset" : "2390",
               "contributionId" : "FACfIAV6DA",
               "@timestamp" : "2015-09-29T07:10:50.000Z",
               "user" : "+11000040488",
               "type" : "jibeqa",
               "@version" : "1",
               "date" : "2015-09-29T07:10:50+0000",
               "file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
               "tags" : "appdev",
               "host" : "app3-us",
               "category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
               "size" : 265,
               "extras" : {}
            }
         },
         {
            "_score" : 1,
            "_type" : "jibeqa",
            "_source" : {
               "file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
               "date" : "2015-09-29T07:10:50+0000",
               "@version" : "1",
               "type" : "jibeqa",
               "extras" : {},
               "size" : 265,
               "category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
               "tags" : "appdev",
               "host" : "app3-us",
               "messageType" : "imdn",
               "appId" : "imas-dev-10.240.21.82",
               "contributionId" : "FACfIAV6DA",
               "@timestamp" : "2015-09-29T07:10:50.000Z",
               "user" : "+11000040488",
               "offset" : "2390",
               "toUser" : "+11000040466"
            },
            "_id" : "AVAYqahQZ2RePRz37-kD",
            "_index" : "appdev-2015.09.29"
         }
      ]
   }
}

The ID is different but the rest is same.

My input is:

  lumberjack {
    port => 5120
    #type => "logs"
    #tags => "appdev"
    ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder_appdev.crt"
    ssl_key => "/etc/pki/tls/private/logstash-forwarder_appdev.key"
    codec => "json"
  }

Filter is:

filter {
  if [type] == "jibeqa" {
    json{
        source => "message"
    }
    mutate {
        remove_field => "message"
    }
    date {
      match => [ "date", "ISO8601"]
    }
  }
}

And the output is:

output {
  if "appdev" in [tags] {
    elasticsearch { host => localhost protocol => "http" cluster => Jibe_ELK index => "appdev-%{+YYYY.MM.dd}" workers => 5 }
  }
#  stdout { codec => rubydebug }
}

Versions:

ii  elasticsearch                       1.5.2                               all          Open Source, Distributed, RESTful Search Engine
ii  logstash                            1.4.5-1-a2bacae                     all          An extensible logging pipeline
ii  logstash-contrib                    1.4.5-1-2bad350                     all          Community supported plugins for Logstash

Update:
I updated logstash to override the _id:

  if "appdev" in [tags] {
    elasticsearch { host => localhost protocol => "http" cluster => Jibe_ELK index => "appdev-%{+YYYY.MM.dd}" document_id => "%{contributionId}-%{offset}-%{date}" workers => 5 }
  }

And logstash.log shows that logstash sends to elasticsearch only one document:

{
             "appId" => "imas-dev-10.240.21.82",
              "date" => "2015-09-29T07:10:52+0000",
          "category" => "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
              "user" => "+11000040488",
            "toUser" => "+11000040466",
       "messageType" => "imdn",
    "contributionId" => "FACfIAV6DA",
              "size" => 265,
            "extras" => {},
          "@version" => "1",
        "@timestamp" => "2015-09-29T07:10:52.000Z",
              "file" => "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
              "host" => "app3-us",
            "offset" => "2868",
              "type" => "jibeqa",
              "tags" => "appdev"
}

But in Elasticsearch it appears with _id I assigned and with default one:

{
   "hits" : {
      "max_score" : 1,
      "total" : 2,
      "hits" : [
         {
            "_source" : {
               "offset" : "2868",
               "date" : "2015-09-29T07:10:52+0000",
               "messageType" : "imdn",
               "user" : "+11000040488",
               "contributionId" : "FACfIAV6DA",
               "file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
               "host" : "app3-us",
               "extras" : {},
               "@version" : "1",
               "toUser" : "+11000040466",
               "category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
               "appId" : "imas-dev-10.240.21.82",
               "type" : "jibeqa",
               "size" : 265,
               "@timestamp" : "2015-09-29T07:10:52.000Z",
               "tags" : "appdev"
            },
            "_index" : "appdev-2015.09.29",
            "_type" : "jibeqa",
            "_id" : "AVAZT-DeZ2RePRz3MKsB",
            "_score" : 1
         },
         {
            "_source" : {
               "appId" : "imas-dev-10.240.21.82",
               "category" : "IMAS_CHATSESSION_MESSAGE_RECEIVED_121",
               "toUser" : "+11000040466",
               "@version" : "1",
               "tags" : "appdev",
               "size" : 265,
               "@timestamp" : "2015-09-29T07:10:52.000Z",
               "type" : "jibeqa",
               "user" : "+11000040488",
               "messageType" : "imdn",
               "date" : "2015-09-29T07:10:52+0000",
               "offset" : "2868",
               "extras" : {},
               "host" : "app3-us",
               "file" : "/opt/jibe/jetty_master/reporting_MASTER_local/logs/reporting.log",
               "contributionId" : "FACfIAV6DA"
            },
            "_index" : "appdev-2015.09.29",
            "_id" : "FACfIAV6DA-2868-2015-09-29T07:10:52+0000",
            "_type" : "jibeqa",
            "_score" : 1
         }
      ]
   },
   "_shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
   },
   "timed_out" : false,
   "took" : 3
}

Most helpful comment

Did you find a resolution for this issue?

All 7 comments

Hi,

Anyone saw this case?

Did you find a resolution for this issue?

We are forwarding log-files for multiple applications on different machines. We've seen this issue now multiple times. Any solution? Can anyone tell me if this could also happen using a newer logstash-version?
Our setup, logstash: 1.5.5
Forwarder: 0.4.0
ES: 1.5
(I know it's EOL)

So it's still not solved?

I hope this could be useful stackoverflow

Logstash will concatenate the three config files together (s3 input, Http input, Filebeat) and see three output sections.
The three output sections are not related to the specific inputs - instead Logstash will send an input from any one of the three sources to all of the configured outputs. As a result your message will be output three times to the same destination.
I would create a separate, single output config file and remove the output section from your 3 input config files.
by Olly Cruickshank

We're Seeing this issue as well.

Filebeat --> Logstash --> Lumberjack --> ES
Seeing the document twice in ES.

Logstash cannot guarantee exactly-once delivery of messages, because in general it handles partial failure (such as unacknowledged failure) by retrying -- the other option in the face of partial failure is data loss, which we try to avoid.

It seems some attempts were made in the above report to provide consistent document IDs to enable Elasticsearch to deduplicate the events it receives, but debugging these types of setups is beyond the scope of our issue tracker and is better handled in the forums where our community of developers and users work together to help each other out.

I am closing out this issue, because it is not actionable.

Was this page helpful?
0 / 5 - 0 ratings