Elasticsearch: Log filled with `Caused by: java.io.IOException: No space left on device` errors on data drive free space consumption

Created on 7 Sep 2016  路  11Comments  路  Source: elastic/elasticsearch

Elasticsearch version: 2.3.2

Plugins installed: []

JVM version: Java(TM) SE Runtime Environment (build 1.8.0_72-b15)

OS version: Ubuntu 14.04.5 LTS (3.19.0-59-generic)

Description of the problem including expected versus actual behavior:

Just had our data drive run out of space on a 1-node elastic instance which caused the logging partition to be filled by a 28GB log file (~300 million lines), filled solely with this repeating section:

Caused by: java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
        at java.nio.channels.Channels.writeFully(Channels.java:101)
        at java.nio.channels.Channels.access$000(Channels.java:61)
        at java.nio.channels.Channels$1.write(Channels.java:174)
        at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:278)
        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
        at org.apache.lucene.store.OutputStreamIndexOutput.writeByte(OutputStreamIndexOutput.java:47)
        at org.apache.lucene.store.RateLimitedIndexOutput.writeByte(RateLimitedIndexOutput.java:66)
        at org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:176)
        at org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
        at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:163)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:605)
        at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
        ... 3 more
        Suppressed: java.io.IOException: No space left on device
                at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
                at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
                at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
                at sun.nio.ch.IOUtil.write(IOUtil.java:65)
                at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
                at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
                at java.nio.channels.Channels.writeFully(Channels.java:101)
                at java.nio.channels.Channels.access$000(Channels.java:61)
                at java.nio.channels.Channels$1.write(Channels.java:174)
                at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:278)
                at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
                at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:68)
                at org.apache.lucene.store.RateLimitedIndexOutput.close(RateLimitedIndexOutput.java:49)
                at org.apache.lucene.util.IOUtils.close(IOUtils.java:97)
                at org.apache.lucene.util.IOUtils.close(IOUtils.java:84)
                at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.close(CompressingStoredFieldsWriter.java:138)
                at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:178)
                ... 6 more
                Suppressed: java.io.IOException: No space left on d

We've had this instances data drive fill up before without this issue being raised.

Steps to reproduce:

  1. Let elasticsearch fill up a drive
  2. Cry
:CorInfrLogging feedback_needed

Most helpful comment

@clintongormley what exactly are you looking for in the form of suggestions? Might I offer the following solutions:

  • Don't spam logs when the data disk is full
  • Actually maybe just don't ever spam logs

All 11 comments

@b1naryth1ef What do you think we should do instead?

Generally I would expect any stable, reliable production database would trigger a few log lines indicating whats happening, and what state the database is entering.

I expect it to be able to come up enough for me to be able to post a delete command

Or is that too much to ask of a database?

How am I supposed to be able to remove the "bad" indices if a filled up the drive causes a DATABASE to refuse to start itself?

In what world is this sane behaviour?

Can you describe a production database which touts around a major version of 5.x that refuses to start with the following beauty of a log, and is not considered as a total joke?

[2017-02-13T18:24:30,487][INFO ][o.e.n.Node               ] [elk-1] initializing ...
[2017-02-13T18:24:30,573][ERROR][o.e.b.Bootstrap          ] Exception
java.lang.IllegalStateException: Failed to created node environment
        at org.elasticsearch.node.Node.<init>(Node.java:255) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]
Caused by: java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
        at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
        at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
        at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
        at org.apache.lucene.store.OutputStreamIndexOutput.getChecksum(OutputStreamIndexOutput.java:80) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b845
5f23ddc1364124 - jim - 2017-01-17 15:57:29]
        at org.apache.lucene.codecs.CodecUtil.writeCRC(CodecUtil.java:542) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-0
1-17 15:57:29]
        at org.apache.lucene.codecs.CodecUtil.writeFooter(CodecUtil.java:390) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 201
7-01-17 15:57:29]
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:140) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
        ... 11 more
        Suppressed: java.io.IOException: No space left on device
                at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
                at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
                at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
                at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
                at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
                at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
                at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
                at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
                at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
                at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
                at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:68) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8
455f23ddc1364124 - jim - 2017-01-17 15:57:29]
                at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:141) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]
                Suppressed: java.io.IOException: No space left on device
                        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
                        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
                        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
                        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
                        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
                        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
                        at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
                        at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
                        at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
                        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
                        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
                        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
                        at java.io.FilterOutputStream.close(FilterOutputStream.java:158) ~[?:1.8.0_121]
                        at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:70) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
                        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:141) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]

I'm done with this crap.
Thanks for the fish.

No useful suggestions presented - closing

@clintongormley what exactly are you looking for in the form of suggestions? Might I offer the following solutions:

  • Don't spam logs when the data disk is full
  • Actually maybe just don't ever spam logs

Wait what. In what world of production databases do people do this shit

I think the underlying issue is that, on an exception that clear is saying "stop trying to use disk space"..... You're still doing something that is attempting to use Disk Space. While I would assume that that is a pretty obvious evaluation, I would further more assume that the course of action is also pretty obvious: this is not an exception that might go away the next time the function is ran, something is critically wrong in the environment, and this occurrence should be treated as such.

I won't pretend to be an expert in enterprise-grade software, but I would assume that critically failures in needed parts of the runtime environment would warrant a crash. Can't exactly keep hopping when the ground has been taken from under you, so why bother?

The disk that logs are written to is not necessarily the same that data is written to (which is where the exception above comes from). Are you suggesting we should not try to log/output anything when there is a critical exception causing the system to exit? If you have a concrete suggestion for cleaner behavior, please open a new issue detailing the feature request.

You could just make the logs not spammy.

Kill the process, and just simply say that it crashed because of no disk space

The disk that logs are written to is not necessarily the same that data is written to (which is where the exception above comes from).

If the log writing was the source of the OOS Exception, b1nzy wouldn't have a 28GB log file filled with the exception spam.

I would do as NullPixel said, and I said in my second paragraph. If the program is encountering an issue that clearly means it's failing to operate properly, meaning, cannot be recovered while the process is live, you don't continue to operate.

I understand that enterprise-grade software needs to be reliable, but this is the result of a problem outside the scope of the software, that cannot be recovered from while the program is running. What else is there to do? Besides spamming the logs?

Got sick of this being left, opened #24299

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ppf2 picture ppf2  路  3Comments

makeyang picture makeyang  路  3Comments

martijnvg picture martijnvg  路  3Comments

ttaranov picture ttaranov  路  3Comments

clintongormley picture clintongormley  路  3Comments