Logstash: Performance issue after upgrading from 2.3.x to 5.6.3 (70k -> 8k eps)

Created on 31 Oct 2017  路  13Comments  路  Source: elastic/logstash

After upgrading from 2.3.4 to 5.6.3, I noticed two performance issues:

  1. Grok filter using array is slower
  2. Overall performance is slower
  • Version:
    5.6.3

  • Operating System:
    RHEL 5.3

  • Config File :

# MULTIPLE grok filters
input {
    generator {
        type => foo
        message => "random message, la la la"
    }
}

filter {
    grok {
        match => {
            "message" => "^1 foo 1 bar$"
        }
    }
    grok {
        match => {
            "message" => "^2 foo 2 bar$"
        }
    }
    grok {
        match => {
            "message" => "^3 foo 3 bar$"
        }
    }
    grok {
        match => {
            "message" => "^4 foo 4 bar$"
        }
    }
    grok {
        match => {
            "message" => "^5 foo 5 bar$"
        }
    }
    grok {
        match => {
            "message" => "^6 foo 6 bar$"
        }
    }
    grok {
        match => {
            "message" => "^7 foo 7 bar$"
        }
    }
    grok {
        match => {
            "message" => "^8 foo 8 bar$"
        }
    }
    grok {
        match => {
            "message" => "^9 foo 9 bar$"
        }
    }
    grok {
        match => {
            "message" => "^10 foo 10 bar$"
        }
    }
    grok {
        match => {
            "message" => "^11 foo 11 bar$"
        }
    }
    grok {
        match => {
            "message" => "^12 foo 12 bar$"
        }
    }
    grok {
        match => {
            "message" => "^13 foo 13 bar$"
        }
    }
    grok {
        match => {
            "message" => "^14 foo 14 bar$"
        }
    }
    grok {
        match => {
            "message" => "^15 foo 15 bar$"
        }
    }
    grok {
        match => {
            "message" => "^16 foo 16 bar$"
        }
    }
    grok {
        match => {
            "message" => "^17 foo 17 bar$"
        }
    }
    grok {
        match => {
            "message" => "^18 foo 18 bar$"
        }
    }
    grok {
        match => {
            "message" => "^19 foo 19 bar$"
        }
    }
    grok {
        match => {
            "message" => "^20 foo 20 bar$"
        }
    }

    metrics {
        meter => "events"
        add_tag => "metric"
    }
}

output {
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "rate_1m: %{[events][rate_1m]}, rate_5m: %{[events][rate_5m]}"
            }
        }
    }
}
# SINGLE grok filter (array)
input {
    generator {
        type => foo
        message => "random message, la la la"
    }
}

filter {

    grok {
        match => {
            "message" => [
                "^1 foo 1 bar$",
                "^2 foo 2 bar$",
                "^3 foo 3 bar$",
                "^4 foo 4 bar$",
                "^5 foo 5 bar$",
                "^6 foo 6 bar$",
                "^7 foo 7 bar$",
                "^8 foo 8 bar$",
                "^9 foo 9 bar$",
                "^10 foo 10 bar$",
                "^11 foo 11 bar$",
                "^12 foo 12 bar$",
                "^13 foo 13 bar$",
                "^14 foo 14 bar$",
                "^15 foo 15 bar$",
                "^16 foo 16 bar$",
                "^17 foo 17 bar$",
                "^18 foo 18 bar$",
                "^19 foo 19 bar$",
                "^20 foo 20 bar$"
            ]
        }
    }

    metrics {
        meter => "events"
        add_tag => "metric"
    }
}

output {
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "rate_1m: %{[events][rate_1m]}, rate_5m: %{[events][rate_5m]}"
            }
        }
    }
}
  • Sample Data:

version | 5.6.3 | 2.3.4
-- | -- | --
single grok (array) | 8k | 70k
multiple grok | 23k | 40k

  • Steps to Reproduce:
    Run logstash with the provided configuration for 5 minutes and check rate_1m.

Most helpful comment

That's wonderful news!
I've tested against the released version (4.0.0) and it fixed the issue.
Thanks a million! ;) @andrewvc @original-brownbear

All 13 comments

Thanks for reporting this. After investigating this, the cause is the new timeout enforcement feature in grok. While this is quite useful, it is inherently expensive. I think we should add a flag to disable that if users want it.

I verified this, BTW, by commenting the watchdog code in/out in the vendor directory. Throughput on my box went from 14k/thread to 30k/thread.

A few attempted optimizations only yielded a 5k/s improvement.

This needs to be researched and improved in a more methodical way than my own ad-hoc experiment.

@andrewvc Thanks for looking at this.

What watchdog code did you comment out - was this in grok plugin or in core? Would setting timeout_millis => 0 have the same affect?

One thing that was really surprising in perf testing was that single grok with 20 patterns is much worse than 20 groks with one pattern. Is this a seperate issue or still related to the timeout enforcement?

@andrewvc see https://github.com/logstash-plugins/logstash-filter-grok/pull/125, could make it more than twice as fast for the posted config without changes to functionality.

If we want more, I'd go with your suggestion to simply bypass the timeout logic when the timeout is set to 0. That seems to be the best value per unit of effort :) Let me know if you want me to make a PR for that too.

@andrewvc thanks for your feedback.

@original-brownbear thanks for digging into this :).

I rerun the benchmark with your patch, and the results look great:

version | 5.6.3 | 2.3.4 | 5.6.3 (patched)
-- | -- | -- | --
single grok (array) | 8k | 70k | 54k
multiple grok | 23k | 40k | 32k

@marioluan nice, thanks a lot for benchmarking this on 5.6.3!

I admit I only ran it against master (which likely explains the larger gain I saw, master has a somewhat lower framework overhead than 5.6.3 :) => more relative change there)

@original-brownbear rerun the benchmark with your new patch (https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/1601d93a5493af4928193ed61ca1d5c0f315d62c).

version | 5.6.3 | 2.3.4 | 5.6.3 (patch https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/e38113be069f1e7da1931863deb4b3586ef1d629) | 5.6.3 (patch https://github.com/logstash-plugins/logstash-filter-grok/commit/1601d93a5493af4928193ed61ca1d5c0f315d62c)
-- | -- | -- | -- | --
single grok (array) | 8k | 70k | 54k | 60k
multiple grok | 23k | 40k | 32k | 32k

@marioluan ah nice, thanks :)
@andrewvc got some time to look at this again? :)

hey guys, do you have any expectation when this will be fixed/pushed/merged?

@original-brownbear
@andrewvc

@marioluan we have just published version 3.4.5 of the grok plugin, that should correctly handle this issue.

That's wonderful news!
I've tested against the released version (4.0.0) and it fixed the issue.
Thanks a million! ;) @andrewvc @original-brownbear

How to install this new version ?
I have tried the following statements but it does not work :-(

bin/logstash-plugin install --version 3.4.5 logstash-filter-grok

Validating logstash-filter-grok-3.4.5
Plugin logstash-filter-grok version 3.4.5 does not exist
ERROR: Installation aborted, verification failed for logstash-filter-grok 3.4.5

@jmreymond the version is 4.0.0 not 3.4.5. Sorry for the mixup above.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

marcinpm picture marcinpm  路  3Comments

simmel picture simmel  路  4Comments

dedemorton picture dedemorton  路  3Comments

amodakvnera picture amodakvnera  路  3Comments

ashangit picture ashangit  路  4Comments