Logstash: Logstash 5.0 refuse to start

Created on 28 Oct 2016  路  11Comments  路  Source: elastic/logstash

Hi,

My logstash v5 crashed yesterday and it refused to start again

Here's the log file

https://gist.github.com/JorisAndrade/b5174ff18feee5b3a7e36772d7ac8a10

bug v5.0.1

Most helpful comment

Ok I got it. I am able to reproduce the problem when the original message/log contains a tags field with a string value. The problem is any plugin downstream that explicitly call the Event#tag method, for example in geoip when there is a lookup error. Event#tag assumes the tags content to be an array so having a single String value result in this exception.

Reproduction:

echo "{\"tags\":\"foo\"}" | bin/logstash -e 'input{stdin{codec => json_lines{}}} filter{geoip{source => "foo"}} output{stdout{codec => rubydebug{}}}'

Sending Logstash logs to /Users/colin/dev/src/elasticsearch/logstash/logs which is now configured via log4j2.properties.
[2016-11-02T15:43:10,494][INFO ][logstash.filters.geoip   ] Using geoip database
[2016-11-02T15:43:10,540][INFO ][logstash.pipeline        ] Starting pipeline
[2016-11-02T15:43:10,567][INFO ][logstash.pipeline        ] Pipeline main started
[2016-11-02T15:43:10,661][INFO ][logstash.agent           ] Successfully started Logstash API endpoint
[2016-11-02T15:43:10,664][ERROR][logstash.pipeline        ] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash.
Exception in thread "[main]>worker3" java.lang.ClassCastException: org.logstash.bivalues.StringBiValue cannot be cast to java.util.List
    at org.logstash.Event.tag(Event.java:348)
[...]

Now I noticed there is another "similar" problem that will happen in the filter config add_tag, where the tag addition will result in the string contactenation as we can see below:

 echo "{\"tags\":\"foo\"}" | bin/logstash -e 'input{stdin{codec => json_lines{}}} filter{mutate{add_tag => "bar"}} output{stdout{codec => rubydebug{}}}'

Sending Logstash logs to /Users/colin/dev/src/elasticsearch/logstash/logs which is now configured via log4j2.properties.
[2016-11-02T15:42:19,628][INFO ][logstash.pipeline        ] Starting pipeline
[2016-11-02T15:42:19,638][INFO ][logstash.pipeline        ] Pipeline main started
[2016-11-02T15:42:19,713][INFO ][logstash.agent           ] Successfully started Logstash API endpoint
{
    "@timestamp" => 2016-11-02T19:42:19.733Z,
      "@version" => "1",
          "host" => "colin-mbp13r.local",
          "tags" => "foobar"
}

Both are easily preventable by first verifying if the tags field contains a String value and in which case convert it back into an Array before adding the new tag. I believe this is the least-surprise behaviour for situations where structured input data already contains a tags field with a String value.

All 11 comments

@JorisAndrade can you provide the config and sample event used? Also what is your OS and Java version?

quick followup: we just discussed the issue with @guyboertje and we understand where and how this is happening, I will skip the details just now but if @JorisAndrade could provide his config we could possibly come up with a quick fix to a plugin creating this problem which would hopefully solve this problem without having to wait for a Logstash bugfix release.

@JorisAndrade - we need your config.

I have a theory that your original data source has a field/key called "tags" - it might be JDBC, Redis/Kafka (JSON) or one of the inputs that get structured data e.g. Beats or CEF.

If this is happening, it would explain the fault.

I can't recreate this with a JSON file, file input and add_tag on 5.0

It could also be a plugin directly settings the tags field with a string value...? Having the config would really help diagnose. @JorisAndrade ?

Yeah sorry for the delay...

My config is pretty big but here's the input and output base config

https://gist.github.com/JorisAndrade/310fe44d0c4467573d45c642ce52dfd8

Here's an example of my solr_filter using an add_tag.

https://gist.github.com/JorisAndrade/502d7d2a8cc28aabe849ff4711a64575

I push data using https://github.com/python-beaver/python-beaver to a RabbitMQ 3.6.5. Then LogStash connects to this queue and consumes it.

I use the multiline filter plugin.

EDIT : I removed the geoip filter from my config and it seems to start. Could it be linked ?

Hm. I looked at the configs and don't see where this could originate exactly. You mentioned that removing the geoip filter seems to fix the problem, I don't see your geoip filter config in the config files you provided, could you provide a complete config which actually fails? Also could you provide a sample log line/message which triggers the exception?

Not related but I also noticed that your Ruby filter seems highly inefficient: if the tags file you load and parse is static, you shouldn't read and parse it on every event but do it only once in the init code.

Ok I got it. I am able to reproduce the problem when the original message/log contains a tags field with a string value. The problem is any plugin downstream that explicitly call the Event#tag method, for example in geoip when there is a lookup error. Event#tag assumes the tags content to be an array so having a single String value result in this exception.

Reproduction:

echo "{\"tags\":\"foo\"}" | bin/logstash -e 'input{stdin{codec => json_lines{}}} filter{geoip{source => "foo"}} output{stdout{codec => rubydebug{}}}'

Sending Logstash logs to /Users/colin/dev/src/elasticsearch/logstash/logs which is now configured via log4j2.properties.
[2016-11-02T15:43:10,494][INFO ][logstash.filters.geoip   ] Using geoip database
[2016-11-02T15:43:10,540][INFO ][logstash.pipeline        ] Starting pipeline
[2016-11-02T15:43:10,567][INFO ][logstash.pipeline        ] Pipeline main started
[2016-11-02T15:43:10,661][INFO ][logstash.agent           ] Successfully started Logstash API endpoint
[2016-11-02T15:43:10,664][ERROR][logstash.pipeline        ] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash.
Exception in thread "[main]>worker3" java.lang.ClassCastException: org.logstash.bivalues.StringBiValue cannot be cast to java.util.List
    at org.logstash.Event.tag(Event.java:348)
[...]

Now I noticed there is another "similar" problem that will happen in the filter config add_tag, where the tag addition will result in the string contactenation as we can see below:

 echo "{\"tags\":\"foo\"}" | bin/logstash -e 'input{stdin{codec => json_lines{}}} filter{mutate{add_tag => "bar"}} output{stdout{codec => rubydebug{}}}'

Sending Logstash logs to /Users/colin/dev/src/elasticsearch/logstash/logs which is now configured via log4j2.properties.
[2016-11-02T15:42:19,628][INFO ][logstash.pipeline        ] Starting pipeline
[2016-11-02T15:42:19,638][INFO ][logstash.pipeline        ] Pipeline main started
[2016-11-02T15:42:19,713][INFO ][logstash.agent           ] Successfully started Logstash API endpoint
{
    "@timestamp" => 2016-11-02T19:42:19.733Z,
      "@version" => "1",
          "host" => "colin-mbp13r.local",
          "tags" => "foobar"
}

Both are easily preventable by first verifying if the tags field contains a String value and in which case convert it back into an Array before adding the new tag. I believe this is the least-surprise behaviour for situations where structured input data already contains a tags field with a String value.

created #6177 for fix

Thank you for your help !

PS : Duly noted for the ruby part, thanks !

@JorisAndrade :thumbsup: ping me if you have problems making the Ruby filter work that way.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

simmel picture simmel  路  4Comments

dedemorton picture dedemorton  路  3Comments

suyograo picture suyograo  路  5Comments

molitoris picture molitoris  路  3Comments

amodakvnera picture amodakvnera  路  3Comments