Elasticsearch: CorruptIndex after restart ES, footer mismatch

Created on 29 Nov 2014 · 19Comments · Source: elastic/elasticsearch

Hey everyone,
I have an index, called metadata, with 5 shards and 1 replicas, After restart ES, index cannot be fully-recovered . All primary shards are recovered properly. However two of the shards were remained unassigned.
ES version: 1.3.2
When I executed this command:

curl -XGET "http:/localhost:9200/_cat/shards

metadata 2 p STARTED 7712779 4.2gb ip-1 Motormouth
metadata 2 r STARTED 7712779 4.2gb ip-2 Harold "Happy" Hogan
metadata 0 p STARTED 7714351 4.1gb ip-2 Harold "Happy" Hogan
metadata 0 r UNASSIGNED
metadata 3 p STARTED 7711363 4.6gb ip-1 Motormouth
metadata 3 r STARTED 7711363 4.6gb ip-2 Harold "Happy" Hogan
metadata 1 p STARTED 7712560 4.2gb ip-2 Harold "Happy" Hogan
metadata 1 r UNASSIGNED
metadata 4 p STARTED 7714620 2.7gb ip-1 Motormouth
metadata 4 r STARTED 7714620 2.7gb ip-2 Harold "Happy" Hogan

When I try to allocate for shards: 0 and 1 for Motormouth it gives an exception like that:

[2014-11-29 15:01:58,383][WARN ][index.engine.internal ] [Motormouth] [metadata][0] failed engine [corrupted preexisting index]
[2014-11-29 15:01:58,384][WARN ][indices.cluster ] [Motormouth] [metadata][0] failed to start shard
org.apache.lucene.index.CorruptIndexException: [metadata][0] Corrupted index [corrupted_3gXTXI3KQtm2e1WPsFntkg] caused by: CorruptIndexException[codec footer mismatch: actual footer=1308690703 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/var/lib/elasticsearch/elasticsearch/nodes/0/indices/metadata/0/index/_8x9g.fdt"))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:727)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2014-11-29 15:01:58,437][WARN ][index.engine.internal ] [Motormouth] [metadata][1] failed engine [corrupted preexisting index]
[2014-11-29 15:01:58,437][WARN ][indices.cluster ] [Motormouth] [metadata][1] failed to start shard
org.apache.lucene.index.CorruptIndexException: [metadata][1] Corrupted index [corrupted_P-smhoB-SEeM7kHiTsIEug] caused by: CorruptIndexException[codec footer mismatch: actual footer=-262453147 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/var/lib/elasticsearch/elasticsearch/nodes/0/indices/metadata/1/index/_avoi_es090_0.doc"))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:727)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Hope someone can help.

Source

mehmetgunturkun

👀1 🚀1 ❤1

Most helpful comment

This happened to me with 1.4.5. It could be related to upgrade from 1.4.4 to 1.4.5 - I am sorry to say I am unsure about that. I only had 1 corrupt shard, though - I'd guess I should have more if it was related to the upgrade ... ?
In either case: _rm/mv_ + a call to _reroute_ worked like a charm!

epleterte on 21 May 2015

👀1 🚀1 ❤1

All 19 comments

hey, first of all nothing got lost so that's good. If we detect a corruption we mark the shard as corrupted there should be a corrupted_????? on disk that prevent your cluster from allocating the shard on that node. We do that to allow you to backup corrupted data etc. if we can allocate the shard somewhere else we will remove it. yet in your case you can remove the shard from the node it question and then ES will recover from the primary. Yet, I'd want to to know what happened that made the shard go corrupted. Did you upgrade lately? if so from what version?

you can rm the files on Motormouth for shard 0 & 1 for the metadata index and then run curl -XPOST 'localhost:9200/_cluster/reroute' this should kick off the recovery. If you are unsure you can post the commands here and I will ahve a look first.

s1monw on 29 Nov 2014

👀1 🚀1 ❤1

Actually, last thing I did, querying something which should return 20M documents. I was not expecting that will down ES. After that I restarted ES and it started to giving that exceptions for these shards.
So you are suggesting that deleting all data under the folder for shards: 0,1, right?
Btw, does ES trying to merge two shards when recovering?
Actually I was thinking to reduce number_of_replicas to 0 and then increase the 1 again. But I am not sure if that causes any data loss because primary shards are in different nodes?

mehmetgunturkun on 29 Nov 2014

your other replicase are just fine I don't think you need to do that. Yet, you certainly can.

s1monw on 29 Nov 2014

which one do you suggest, (reducing number of shards) or (removal shards and recover shards)?

mehmetgunturkun on 29 Nov 2014

I'd got and do a mv /your/path/to/data/indices/metadata/0 /your/path/to/data/indices/metadata/backup_0

then run reroute and wait until the shard is active. remove the backup continue with the second shard.

s1monw on 29 Nov 2014

ok I will try thanks

mehmetgunturkun on 29 Nov 2014

Hi @mehmetgunturkun

Actually, last thing I did, querying something which should return 20M documents

Do you mean you requested 20M documents in one search response? eg { "from": 0, "size": 20000000 }`?

If so, that could have caused an OOM exception. Please can you look in the logs on each node to see if you had an OOM?

clintongormley on 1 Dec 2014

yeah, it gave an Out of Memory Exception; but still couldn't understand why there is a mismatch?

mehmetgunturkun on 1 Dec 2014

Basically, if you get an OOM exception, all bets are off. At that stage the JVM is in an undefined state. That said, it shouldn't write a commit point that includes a file which hasn't been written correctly.

Was this index originally created with an older version of Elasticsearch? If so, which version? It would be helpful if you could upload your logs somewhere.

clintongormley on 1 Dec 2014

👀1 🚀1 ❤1

actually, I was almost sure about that this index created on 1.3.2 but I saw a file, "_avoi_es090_0.doc"," does it indicate version, 0.9?
sample of log file is in the following link:
https://dl.dropboxusercontent.com/u/69632603/New%20folder/elasticsearch.log

mehmetgunturkun on 1 Dec 2014

👀1 🚀1 ❤1

does it indicate version, 0.9?

no that is confusing it only means we added this codec in 0.90 but that is only a naming thing.

https://dl.dropboxusercontent.com/u/69632603/New%20folder/elasticsearch.log

you know what this seems like a half written index during a recovery. I think what happened here is you got an OOM and one of the shards was recoverying during that time. Then it got corrupted because you ran into OOM since recovery didn't finish. It left your shard in half baked state. We fixed this in 1.4.0 where we rename files after they all have been written and we rename them such that commitpointer are renamed last. I think that is what happened, that explains the missing file as well as the truncated one?

s1monw on 1 Dec 2014

👀1 🚀1 ❤1

yeah this is what happened most probably, because on recovery, my application was still sending documents to index. Thanks a lot guys.

mehmetgunturkun on 5 Dec 2014

@mehmetgunturkun I assume this got resolved... I am closing it please reopen if you object.

s1monw on 6 Dec 2014

@clintongormley First, thanks for the clear (and easy to find) help here. Today we ran into an issue with a corrupted shard as well. As far as we know though, the problems started when our master node and other (not master) node started having communication issues. We are still investigating what happened, but I was wondering if you'd be interested in our logs?

Bertg on 9 Apr 2015

👀1 🚀1 ❤1

Hi @Bertg

We may well be. I'd open a new issue mentioning the version that you're using, plus all of the details including the logs. Note: if you're using an older version, there's a good chance that we've already fixed the bug, so you may want to trawl through the issues list first, to see if you find something that could explain the problem.

clintongormley on 9 Apr 2015

👀1 🚀1 ❤1

@clintongormley Actually investigating it more, I think we figured out what happened. A very complex query got generated, overloaded the master and the slaves "got confused" somehow. It does seem that later versions might fix the issue we had. We'll do the update and try to re run the offending query. If it happens again we'll open a ticket.

Bertg on 9 Apr 2015

👀1 🚀1 ❤1

Had this issue with 1.4.4.

Fix mentioned in this issue did work: renaming the .../0 directory to .../0.backup. The 'unassigned' replica became primary, data was accessible again and a new replica was created. Case closed.

rtoma on 23 Apr 2015

👀1 🚀1 ❤1

Hi,

A similar error happened to me for a 1.3.4 ES version. It seems too there is a broken recovery for one shard (among 5 for the same index). A lot of people suggest it could be resolved by renaming the shard directory. The 'unassigned' replica would become primary so data could be accessible again and a new replica created. Can someone confirm that ?

Someone in the #10066 suggested that this error would not happen in a 1.5.0 version ? Does someone agree with that ?