Logstash: Logstash losing messages

Created on 14 Sep 2016 · 25Comments · Source: elastic/logstash

Hi,
I'm using this setup

logstash 2.4.0
rsyslog 8.21.0
redis 2.8.17

Central rsyslog server sends all messages to rsyslog on logstash server. I used tcpdump and I can confirm thath all messages arrive to server, go through rsyslog and all messages ends in redis database.

I have configured input in logstash:

  redis {
        host => '127.0.0.1'
        data_type => 'list'
        key => 'syslog'
        codec => json { charset => "ISO-8859-1" }
        type => 'syslog'
        threads => 20
    }

some filters and output to ES 2 node cluster.

When I send 30 messages they arive to redis database but only few of them (4-5) arrive in ES database.
I'm using this setup becouse there is big number of messages per second and I was starting experience loosing some messages. Then I create this setup but still, big number of messages doesn't end in ES.

For all general issues, please provide the following details for fast resolution:

Version: 2.4.0
Operating System: Debian Wheezy
Config File (if you have sensitive info, please remove it):

input {
    # Redis Qeueue
    redis {
        host => '127.0.0.1'
        data_type => 'list'
        key => 'syslog'
        codec => json { charset => "ISO-8859-1" }
        type => 'syslog'
        threads => 20
    }
}

filter {

   if [type] == 'syslog' {
    #
    # SYSLOG
    #
        # Discard mongo
        if [program] =~ "mongo.*" {
            drop{}
        }
        if "_grokparsefailure_sysloginput" in [tags] {
            drop{}
        }

        # NodeJS logs in JSON format, parse
        if [program] == 'node' {
            json {
                source => "message"
            }
        }

        # Cron messages
        else if [program] == 'cron' {
            grok { match => { "message" => "%{CRONLOG}" } }
        }

        # Query counter log
        else if [program] == 'query_counter.log' {
            grok {
                match => {"message" => "%{WORD:qc_type}: %{DATA:qc_script}%{SPACE}\|%{SPACE}%{INT:qc_master_select:int}%{SPACE}%{INT:qc_master_insert:int}%{SPACE}%{INT:qc_master_update:int}%{SPACE}%{INT:qc_master_delete:int}%{SPACE}%{INT:qc_master_replace:int}%{SPACE}\|%{SPACE}%{INT:qc_slave_select:int}%{SPACE}%{INT:qc_slave_insert:int}%{SPACE}%{INT:qc_slave_update:int}%{SPACE}%{INT:qc_slave_delete:int}%{SPACE}%{INT:qc_slave_replace:int}%{SPACE}\|%{SPACE}%{INT:qc_archive_select:int}%{SPACE}%{INT:qc_archive_insert:int}%{SPACE}%{INT:qc_archive_update:int}%{SPACE}%{INT:qc_archive_delete:int}%{SPACE}%{INT:qc_archive_replace:int}"}
            }
        }

        # Apache Logs
        else if [program] == 'httpd_access' {
            grok {
                match => { "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident}( %{USER:auth})? \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response:int} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} %{GREEDYDATA:server_name} %{NUMBER:time_s:int}/%{NUMBER:time_us:int}" }
            }
        }

        # Varnishincsa
        else if [program] == 'varnishncsa' {
            grok {
                match => { "message" => "%{IPORHOST:clientip}[ ]+%{HTTPDUSER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response:int} (?:%{NUMBER:bytes:int}|-) %{QS:referrer} %{QS:agent} (%{NUMBER:duration:float}|nan) %{GREEDYDATA:varnish_status}( %{NUMBER:varnish_miss_num:int})?" }
            }
        }
        # PHP
        else if [program] == 'php' {
            if 'PHP Notice:' in [message] {
                mutate { add_tag => ["php_notice"] }
            }
            else if 'PHP Warning:' in [message] {
                mutate { add_tag => ["php_warning"] }
            }
            else if 'PHP Fatal error:' in [message] {
                mutate { add_tag => ["php_fatalerror"] }
            }
        }

        ##
        ## TEAM TAGS
        ##

        # DCAPI
        if [logsource] =~ /^(dcapitest|dcapi|exapi).*$/ or [program] =~ /^DCAPI.*$/ {
            mutate { add_tag => ["dcapi"] }

            if [program] == 'DCAPI_access' {
                grok {
                    match => { "message" => "\"%{WORD:access_method} %{GREEDYDATA:access_url}\" %{NUMBER:access_req_size:int} %{NUMBER:access_status_code:int} %{NUMBER:access_res_size:int} %{NUMBER:access_worker_pid:int} %{GREEDYDATA:access_host} %{GREEDYDATA:referer} %{NUMBER:access_time_sec}/%{NUMBER:access_usec}" }
                }
            } 
        }

        # OP
        else if [logsource] =~ /^(opp|opweb|opw|opproxy|opapi|opcache)[0-9]+$/ or [program] =~ /^(OPW_|OPP).*$/ {
            mutate { add_tag => ["op"] }
        }

        # DC
        else if [logsource] =~ /^(dccml|dcproxy|videoplayer|dc2r|dcr|dccml|dcproxy|dcprof|dccron)/ {
            mutate { add_tag => ["dc"] }
        }

        # FS
        else if [logsource] =~ /^(node|fsp|wata|lsproxy|lsbproxy|fs|lscache|lsnp|adrm|adrs|adadmin)/ {
            mutate { add_tag => ["fs"] }
        }

        # LS
        else {
            mutate { add_tag => ["ls"] }
        }


    }

}

output {
   # SYSLOG
    if [type] == 'syslog' {
        elasticsearch { 
            hosts => [ "lses1", "lses2" ]
            index => "syslog-%{+YYYY.MM.dd}"
            template => "/etc/logstash/elastic-syslog-template.json"
            template_overwrite => true
        }

    }

}

Sample Data:
Steps to Reproduce:

unconfirmed

Source

rvojcik

All 25 comments

This doesn't look like a bug. I am feeling this is a configuration issue. Can you ask this on discuss.elastic.co?

jordansissel on 14 Sep 2016

Ok, I created new topic on discuss.

rvojcik on 15 Sep 2016

No reply on discuss. But I tried to simplyfy config.

input {
    syslog {
        type => 'syslog'
        port => 5114
        codec => plain { charset => "ISO-8859-1" }
    }
}
filter{}
output {
        elasticsearch { 
            hosts => [ "lses1.edrive.intra", "lses2.ls.intra" ]
            index => "syslog-%{+YYYY.MM.dd}"
            template => "/etc/logstash/elastic-syslog-template.json"
            template_overwrite => true
        }
}

Still the same problem. I also forget mentioned that there is lot of logs comming in, around 20Mb/s

Is there any policy or something in logstash which can drop messages ? Or anything I can tune little bit ?
I have LS_HEAP_SIZE in environment set to 15G

rvojcik on 19 Sep 2016

@rvojcik What's the exact issue? Are you not seeing all the logs in ES with that simple config?

suyograo on 23 Sep 2016

@suyograo Yes, some of the messages sent to logstash don't arrive to ES database, so for example, I create 30 messages and in ES database I can see just 6 of them

rvojcik on 23 Sep 2016

What protocol (tcp? Udp?) are you sending with?

What query are you using to search elasticsearch for your data?

On Friday, September 23, 2016, Robert Vojcik [email protected]
wrote:

@suyograo https://github.com/suyograo Yes, some of the messages sent to
logstash don't arrive to ES database, so for example, I create 30 messages
and in ES database I can see just 6 of them

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/elastic/logstash/issues/5915#issuecomment-249309901,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIC6kbmWSXfoCJ6uuAFwnXYqsRBUbXPks5qtEhHgaJpZM4J84Wh
.

jordansissel on 24 Sep 2016

It's UDP protocol. Data is from central rsyslog server which forward them to logstash server.

I'm using Kibana for searching ES.

rvojcik on 24 Sep 2016

Can you provide your full Logstash config, all logs sent, the search
results, and the query used?

On Friday, September 23, 2016, Robert Vojcik [email protected]
wrote:

It's UDP protocol. Data is from central rsyslog server which forward them
to logstash server.

I'm using Kibana for searching ES.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/elastic/logstash/issues/5915#issuecomment-249325350,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIC6o-5xdm6JjVYqNG4hQ4C6txfCXjjks5qtGDegaJpZM4J84Wh
.

jordansissel on 24 Sep 2016

@jordansissel There is my whole logstash config in my first comment, and you can also use the simple one in my second reply.

I have number of logstash instances, for various things. But this problem I experience only with parsing config from rsyslog. Maybe there is some problem with performance? Like I said earlier there is traffic around 20Mb/s which is logs forwarded from central rsyslog. When I used it separately there is no problem. When I send all my logs traffic to logstash, then I can experience loosing some of the messages.

I can understand that logstash itself cannot handle big traffic of logs, so I test it with redis as cache layer but still same problem. I can see from tcpdump thath all logs arrive to logstash server, logs are stored in redis database but not all of them was sent to ES database.

I think for reproducing the problem you have to somehow generate syslog traffic to logstash around 20Mb/s

rvojcik on 25 Sep 2016

thank you :)

jetaro1 on 7 Oct 2016

Here is example how messages are dropped somehow. Very simple config, no filter like in coment 2. Just input and output.

On server1 I generated these messages:
for i inseq 1 30; do logger -t WIRO "testing $i"; echo $i; done

I used tcpdump to monitor input port on logstash server and output destination port to ES cluster

This in input from central rsyslog which forward all logs to logstash

 tcpdump -i any -A port 5114 or port 9200 | grep WIRO
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
E..E..@.@..=...<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 1
E..E..@.@......<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 2
E..E..@.@......<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 3
E..E.J@.@......<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 4
E..E.u@.@......<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 5
E..E..@.@......<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 6
E..E..@.@..{...<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 7
E..E..@[email protected]...<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 8
E..E..@[email protected]...<.....]...1.u<13>Oct  7 21:21:46 dcrs1 WIRO: testing 9
E..F..@[email protected]...<.....]...2.C<13>Oct  7 21:21:46 dcrs1 WIRO: testing 10
E..F..@.@......<.....]...2.B<13>Oct  7 21:21:46 dcrs1 WIRO: testing 11
E..F.-@.@......<.....]...2.A<13>Oct  7 21:21:46 dcrs1 WIRO: testing 12
E..F.L@.@......<.....]...2.@<13>Oct  7 21:21:46 dcrs1 WIRO: testing 13
E..F.i@.@......<.....]...2.?<13>Oct  7 21:21:46 dcrs1 WIRO: testing 14
E..F..@.@......<.....]...2.><13>Oct  7 21:21:46 dcrs1 WIRO: testing 15
E..F..@[email protected]...<.....]...2.<<13>Oct  7 21:21:46 dcrs1 WIRO: testing 17
E..F..@.@..:...<.....]...2.;<13>Oct  7 21:21:46 dcrs1 WIRO: testing 18
E..F..@.@..!...<.....]...2.:<13>Oct  7 21:21:46 dcrs1 WIRO: testing 19
E..F.+@.@......<.....]...2.C<13>Oct  7 21:21:46 dcrs1 WIRO: testing 20
E..F.R@.@......<.....]...2.B<13>Oct  7 21:21:46 dcrs1 WIRO: testing 21
E..F.q@.@......<.....]...2.A<13>Oct  7 21:21:46 dcrs1 WIRO: testing 22
E..F..@.@......<.....]...2.@<13>Oct  7 21:21:46 dcrs1 WIRO: testing 23
E..F..@[email protected]...<.....]...2.?<13>Oct  7 21:21:46 dcrs1 WIRO: testing 24
E..F..@.@.._...<.....]...2.><13>Oct  7 21:21:46 dcrs1 WIRO: testing 25
E..F..@[email protected]...<.....]...2.=<13>Oct  7 21:21:46 dcrs1 WIRO: testing 26
E..F..@.@......<.....]...2.<<13>Oct  7 21:21:46 dcrs1 WIRO: testing 27
E..F.<@.@......<.....]...2.;<13>Oct  7 21:21:46 dcrs1 WIRO: testing 28
E..F.Z@.@......<.....]...2.:<13>Oct  7 21:21:46 dcrs1 WIRO: testing 29
E..F.s@.@......<.....]...2.C<13>Oct  7 21:21:46 dcrs1 WIRO: testing 30

So all 30 messages arrived to port 5114 to logstash server

But tcpdump on port 9200 on logstash server showing this

{"message":"testing 12","@version":"1","@timestamp":"2016-10-07T19:25:09.000Z","type":"syslog","host":"192.168.2.60","priority":13,"timestamp":"Oct  7 21:25:09","logsource":"dcrs1","program":"WIRO","severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","tags":["dc"]}
{"message":"testing 13","@version":"1","@timestamp":"2016-10-07T19:25:09.000Z","type":"syslog","host":"192.168.2.60","priority":13,"timestamp":"Oct  7 21:25:09","logsource":"dcrs1","program":"WIRO","severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","tags":["dc"]}

So just few of them was sent to ES database.

rvojcik on 7 Oct 2016

Hi,
I am also facing same kind of problem. I am using logstash 2.4, elasticsearch 2.4.1, kibana 4.6.1.
I am not an expert in ELK, but started to explore it. Many times I noticed and felt that logstash losing the logs. I am receiving around 500 events per seconds and sometimes i am not getting the important logs, but its generate in the device. I can see the log in the device local buffer. I need to confirm that logstash losing the logs or not, so I made a test setup to confirm and test this.

I have a elasticsearch server which is 8 core, 64GB ram, 3.2TB HDD in RAID (15k rpm) running on Centos 7 64bit and Kibana also running in the same server.

I have a seperate server in the same subnet, and connected in the same switch. This server is only for logstash - 16 core, 28gb ram, 600gb hdd. (HP DL 580 G4) running Centos 7 64bit.

Now from my windows workstation I used kiwi syslog message generator to generate 500 syslog messages towards logstash server. Done. :)
when I look the kibana I can see only 325 logs, I did the testing couple of times, everytime I can see 280 or 325 or 340 or 375, Its varying between 250 to less than 400.

I did a tcpdump during the test, I can see all the 500 logs arrived to the server perfectly. So there could be no way for network related issue or packet loss during the transit. I am not sure what I am missing, Can anyone shed some lights on this? Thanks in Advance
Kumar

Kumar21000 on 10 Oct 2016

Just for information. I have incomming traffic from central rsyslog around 14 000 messages/s.

rvojcik on 10 Oct 2016

I did some additional tests and I think this issue is related with Syslog Input plugin or parsing messages with grok.

When I used simple config, UDP input module, I generated around 54 000 msgs/s and not experiencing any message loss.
When I started experimenting with syslog and grok logstash starts lossing some of the messages.

rvojcik on 18 Oct 2016

Udp is a lossy protocol. The behavior is has on congestion is to drop
packets. This seems normal.

On Tue, Oct 18, 2016 at 4:34 AM Robert Vojcik [email protected]
wrote:

I did some additional tests and I think this issue is related with Syslog
Input plugin or parsing messages with grok.

When I used simple config, UDP input module, I generated around 54 000
msgs/s and not experiencing any message loss.
When I started experimenting with syslog and grok logstash starts lossing
some of the messages.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/logstash/issues/5915#issuecomment-254481536,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIC6rBaRz5f2Mc4gfe6u5kg8gjkbfuZks5q1K6ugaJpZM4J84Wh
.

jordansissel on 18 Oct 2016

@jordansissel yes, that's right, but I tested network separatelly to confirm that messages arrive to application. I replaced logstash with netcat, there was same small loss around 0.1% max compared to 45+% when using logstash. For now I have no idea what to do with it. Maybe loadbalance traffic between several instances of logstash parser.

rvojcik on 18 Oct 2016

Robert:
Thanks for your input, may be I will give a try to load balance between several instances. I got little improvement after changing pipeline batch size ( i set it to 5000), but still log losses are there.

Jordan:
Yes, I agree with you, but udp packets are reached the server without loss, drops are happening in the logstash level, somewhere in the input or filter stage,

I also done the same thing like Robert did, I replace logstash with rsyslog, and perfectly got the logs upto 10k with UDP input.

Kumar21000 on 18 Oct 2016

The data is most likely being dropped by the servers kernels network stack,
before Logstash can read it. Again, udp is lossy.

If Logstash is processing slower than you are producing udp packets, udp
packets will be dropped by your server network stack.

On Tuesday, October 18, 2016, Kumar21000 [email protected] wrote:

Robert:
Thanks for your input, may be I will give a try to load balance between
several instances. I got little improvement after changing pipeline batch
size ( i set it to 5000), but still log losses are there.

Jordan:
Yes, I agree with you, but udp packets are reached the server without
loss, drops are happening in the logstash level, somewhere in the input or
filter stage,

I also done the same thing like Robert did, I replace logstash with
rsyslog, and perfectly got the logs upto 10k in UDP input.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/logstash/issues/5915#issuecomment-254612878,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIC6g0yuQKEVPZln8npJwU21CvShkowks5q1R2lgaJpZM4J84Wh
.

jordansissel on 18 Oct 2016

👍1

You can tune certain things to try and reduce udp data loss (network
buffers in your OS, queue_size and workers in the udp input)

On Tuesday, October 18, 2016, Jordan Sissel [email protected]
wrote:

The data is most likely being dropped by the servers kernels network stack,
before Logstash can read it. Again, udp is lossy.

If Logstash is processing slower than you are producing udp packets, udp
packets will be dropped by your server network stack.

On Tuesday, October 18, 2016, Kumar21000 <[email protected]

Robert:
Thanks for your input, may be I will give a try to load balance between
several instances. I got little improvement after changing pipeline batch
size ( i set it to 5000), but still log losses are there.

Jordan:
Yes, I agree with you, but udp packets are reached the server without
loss, drops are happening in the logstash level, somewhere in the input
or
filter stage,

I also done the same thing like Robert did, I replace logstash with
rsyslog, and perfectly got the logs upto 10k in UDP input.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/elastic/logstash/issues/5915#issuecomment-254612878
,
or mute the thread
AAIC6g0yuQKEVPZln8npJwU21CvShkowks5q1R2lgaJpZM4J84Wh>
.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/elastic/logstash/issues/5915#issuecomment-254615596,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIC6ttvKrVayHUnZT55vnhatDvHmm2Gks5q1R_vgaJpZM4J84Wh
.

jordansissel on 18 Oct 2016

👍1

Thanks Jordan, i will check that.

Kumar21000 on 18 Oct 2016

👍1

Closing this, please open a new issue if you're still encountering issues.

acchen97 on 1 Feb 2017

Thanks ! Just for your information. I tune up network UDP layer to handle more messages and I have to loadbalance traffic using linux LVS. Now it's working OK for about 60 000 msgs/s

rvojcik on 16 Feb 2017

@rvojcik could you add the parameters you changed here? :)

bjoernhaeuser on 5 Mar 2017

First of all, I have to use virtual server, becouse we had lots of free virtual resources.

VPS (8 CPU, 16 GB Ram)

Sysctl UDP settings

net.core.rmem_max=31457280
net.ipv4.udp_mem=4194304 8388608 33554432
net.ipv4.udp_rmem_min=8388608

Logstash.yml

pipeline.workers: 8
#
# How many workers should be used per output plugin instance
#
pipeline.output.workers: 3
#
# How many events to retrieve from inputs before sending to filters+workers
#
pipeline.batch.size: 1000
#
# How long to wait before dispatching an undersized batch to filters+workers
# Value is in milliseconds.
#
pipeline.batch.delay: 120

Logstash input config

input {
        udp {
            type => 'syslog'
            port => 5114
            host => "192.168.0.1"
            codec => plain { charset => "ISO-8859-1" }
            workers => 3
            queue_size => 72000
            receive_buffer_bytes => 31457280
        }
}

With all these settings I have around 16 000 msgs/s per VPS. Traffic is loadbalanced between 5 VPS so it's ready for 80 000 msgs/s (tested value).

rvojcik on 13 Mar 2017

Here is some real numers from logstash internal metrics and real traffic
real_logstash_numbers