Logstash: 0 byte head checkpoint file

Created on 12 Sep 2017  路  8Comments  路  Source: elastic/logstash

We have had reports of 0 byte head checkpoint file preventing logstash to start because of this invalid checkpoint. In both cases where this was reported it seemed to have happened in crash scenarios:

  • one where we suspect a file handles depletion
  • one where the VM running logstash is issued a reboot

At this point I think we should review the checkpoint handling code to see where this could happen and how to mitigate it.
We should also look at simulating/testing crash conditions to improve the checkpoint handling robustness.

bug persistent queues

All 8 comments

I don't think we can prevent an empty checkpoint file from being created on a crash.
Filesystems generally don't have any atomic create + write operation, so JDK doesn't offer anything to us here.

I see two fixes:

  • (trivial) Just handle empty checkpoint files by ignoring and deleting them on queue open
  • Use one index file instead of multiple checkpoint files and get atomic writing of checkpoint information that way (here mmap would be great and way outperform the current solution to checkpoints since the file is probably just one sector in almost all use cases, so if we only ever do atomic puts on the mmaped buffer for the checkpoint(s) file it's impossible to ever corrupt it)

I like the second option :)

Good suggestions @original-brownbear.

One of the main problem we have is that the 0 byte checkpoints have always been head checkpoints (obviously since it's the only one written to) which complicates the matter a little bit since the head checkpoint holds the firstUnackedPageNum pointer which is important for reconstructing the queue state.

(I'll add here that it would be possible to add a queue recovery mode where we could try reconstruct a queue state by looking at the page file(s) content if a checkpoint is missing etc... but I'd prefer to focus on discussing the cause in this issue and possibly defer potential recovery solutions into another issue)

Filesystems generally don't have any atomic create + write operation

True and our FileCheckpointIO strategy is to open/write/close on every write operation. I believe this is first: not very efficient and second and most important: extremely prone to result in these zero byte checkpoints on write with increasing odds as the number of write operations increases, for example by using a lower queue.checkpoint.writes setting.

The idea of the checkpoint file is to always hold the last known "safely persisted" state of its data page file. Obviously having an empty checkpoint file completely defeats this and should not happen.

What if we open the head checkpoint only once and simply change its position to 0 then write? Would that be safer or result in the same potential problem between setting the position and writing? OR similar to what you suggest, we could mmap the head checkpoint, always keep it open and simply update it continuously? These 2 strategies would not have to change the design too much and could both increase performance and be safer for crash scenarios?

WDYT?

@colinsurprenant

What if we open the head checkpoint only once and simply change its position to 0 then write?

This :) => let's do it.

@colinsurprenant @original-brownbear a user has reproduced this issue themselves last night. They now have a 50GB queue... is there any way for them to recover it now?

@colinsurprenant I remember reading (somewhere in my Github feed and quite a while ago :D), that you could at least get your data back (though potentially with duplicates) by deleting the broken checkpoint file of size 0, is that true?

@nerophon which version of LS is it?
Can you ask the user to list the queue directory?
Can you also ask the user to run this command from the LS root dir? (adjust queue patch dir is different form default):

$ vendor/jruby/bin/jruby -rpp -e 'Dir.glob("data/queue/main/checkpoint.*").sort_by { |x| x[/[0-9]+$/].to_i}.each { |checkpoint| data = File.read(checkpoint); version, page, firstUnackedPage, firstUnackedSeq, minSeq, elementCount, crc32 = data.unpack("nNNQ>Q>NN"); puts File.basename(checkpoint); p(version: version, page: page, firstUnackedPage: firstUnackedPage, firstUnackedSeq: firstUnackedSeq, minSeq: minSeq, elementCount: elementCount, crc32: crc32) }'

@nerophon also, are you sure the problem your are reporting is related to zero-byte checkpoint files and not page file per #7809 ?

fixed via https://github.com/elastic/logstash/pull/9303 :)

Was this page helpful?
0 / 5 - 0 ratings