Logstash: PQ open exception reported as config error

Created on 23 Jan 2017  路  13Comments  路  Source: elastic/logstash

when LS starts, if an exception is triggered in the PQ open() method, it will surface as an invalid config exception.

For example:

fetched an invalid config {:config=>"...", :reason=>"first seqNum=0 is different than minSeqNum=3879808"}
blocker bug persistent queues v5.5.0

All 13 comments

I think we should really solve this for 5.3.0 as it makes things hard to diagnose for everyone.

This bug is preventing me from using persisted queue, as far as I can tell. If I turn on persisted queue this error shows up in my log and my logstash pipeline is unable to process any events. Just upgraded today to 5.3.0. Going back to memory-based queue makes things work again.

Here is one of the exceptions I got:

Logstash failed to create queue {"exception"=>"Element seqNum 1 is expected to be 3", "backtrace"=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:136:in `open'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:40:in `with_queue'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:29:in `create_file_based'", "/usr/share/logstash/logstash-core/lib/logstash/queue_factory.rb:30:in `create'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:142:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:277:in `create_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:95:in `register_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:264:in `execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:183:in `run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}
[2017-04-04T10:47:19,586][ERROR][logstash.agent           ] Cannot load an invalid configuration {
:reason=>"Element seqNum 1 is expected to be 3", :backtrace=>["org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:136:in `open'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:40:in `with_queue'", "/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:29:in `create_file_based'", "/usr/share/logstash/logstash-core/lib/logstash/queue_factory.rb:30:in `create'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:142:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:277:in `create_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:95:in `register_pipeline'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:264:in `execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:183:in `run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}

I think that error message is coming from this source code line:

https://github.com/elastic/logstash/blob/a06dc211ff1f33442b9fa03f5892142aa80588ca/logstash-core/src/main/java/org/logstash/common/io/AbstractByteBufferPageIO.java#L145

@robinsmidsrod Thank you for the details!

This error is a symptom of an corrupted queue (one that was corrupted on a previous Logstash run), and I think it is caused by whatever is also causing the issues in #6829.

The specific problem in this issue is not a corrupt queue, but the problem that a corrupt queue is reported to the user as "an invalid configuration" -- It's not an invalid configuration; it is a bug.

@jordansissel Should I file a new issue for my problem, or should I just leave it here?

@robinsmidsrod you don't need to file a new issue. we found the cause of that issue and it will be fixed by #6901 which will be part of 5.4.0 and 5.3.1 (looking at possibly back porting to 5.2 and 5.1).

as @jordansissel mentioned this issue here is about the fact that these errors are reported as configuration errors which is not the case. You can follow issue #6829 and PR #6901

Thanks!

@robinsmidsrod To clarify, we fixed the bug that created the situation where you are now getting a "exception"=>"Element seqNum X is expected to be Y" but this fix will not help you recover from your now "corrupted" queue. If you need to absolutely recover that corrupted queue please open a new issue for that and we will see what we can do. Otherwise, upgrading to upcoming 5.3.1 or 5.4 will prevent for creating that corruption in the future.

@colinsurprenant Thanks a bunch for fixing the issue that was causing the corrupted queue in the first place! While I was waiting for a fix I've been using a memory-based queue, so I don't have that corrupted queue anymore. Hopefully I can switch back to the disk-based queue when the new version comes out.

me too

The good news is that the queue open exception reported as config error is now fixed in master thanks to the Agent refactor in #6632.

It is definitely also worth fixing in 5.x, for 5.4 since PQ is going GA.

PR #6968 for 5.x

fixed by #6968

Is this problem solved? Why I still have the same problem after update
5.3.0 - > 5.3.2

@hailin0 could you create a new issue with a complete description of the problem you are experiencing? If it is related to the Persistent Queue, note that is was officially GA'ed in 5.4.0.

Was this page helpful?
0 / 5 - 0 ratings