Elasticsearch: Docs missing from a replica

Created on 8 Aug 2016  路  16Comments  路  Source: elastic/elasticsearch

Elasticsearch version: 2.3.3
Plugins installed: cloud-aws
JVM version: 1.8.0_25
OS version: Ubuntu 12.04 LTS
Description of the problem including expected versus actual behavior:

I have a document that exists in one replica of a shard but not in the other replica. The initial symptom is that an update on a document id failed, further investigation showed that some nodes could search for the id and some failed. For example (all identifiers changed to protect the innocent):

GET /myindex/_search?q=doc1234

returns this when the query hits the shard where the doc doesn't exist:

"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
  "_index" : "myindex",
  "_type" : "mydoctype",
  "_id" : "doc1234"
  "_score" : 1.0,
  "_source" : {
    "explain" : true
  }
} ]

But when it hits the shard where the doc does exist, I get the full document back.

When I add the 'explain' parameter I can figure out which shard instance it is, and turns out that the primary has 3 fewer documents than the replica (out of 2.3M docs)

The document in question above is in the replica, but not the primary. I restored a snapshot of the index to another cluster, and I guess because it only restores the primary shard, the document is missing from there as well in both replica and primary.

After further investigation I see that the document count between primary and replicas are different in 469 of my 8662 shards. I suspect there's more but short of comparing doc id's between the primary/replica of each shard (not really possible given the size) that's all I can go by for now. So this is not an isolated problem but quite a bit more widespread.

Of those shards, some were created and brought over from 1.7. Others were created in 2.3.3 very recently. There's no pattern for age of index. The document in question was created on August 4, 2016 on an older index, but other time-based indices created in the past few days suffer the same document mismatch.

I have started a discussion here: https://discuss.elastic.co/t/docs-missing-from-a-replica/57382
This does smell like a much older discussion over here: https://discuss.elastic.co/t/how-to-fix-primary-replica-inconsistency/9016/18

Looking for ideas on how to troubleshoot this further.

:CorInfrCore feedback_needed

All 16 comments

Small correction, I realized a large portion of the shards that are reporting different document counts are actively being written to so that's expected. The shards that are not actively being written to and have different documents are 8 in total. But the overall nature of the problem doesn't change.

As a workaorund I can forcefully "fix" the doc by retrieving and updating it with the "doc_as_upsert" attribute added and set to true. Seems the two nodes are happy to update or insert the doc as they see fit.

How are documents written/updated/deleted in your cluster? Do you use version types? Do you use delete-by-query?
After fixing the inconsistencies between primary/replica, have you seen this happening again?

Are the 8 indices with differing doc counts from old versions, or 2.3.3?

@ywelsch The vast majority of the documents are written using bulk indexing, for both inserts and updates. We rarely delete documents. Version types are not used, we use the default settings. We do not use delete-by-query.

@clintongormley In each case the indices have a mix of versions that return from the "/index/_segments" call, a mix of 4.10.4 and 5.5.0, which I think means the indices were created prior to 2.3.3. In those cases I didn't upgrade in place, but did a snapshot from v1.7 and restore to the new v2.3.3 cluster, in case that makes a difference.

We are continuing to see this issue and it is becoming more serious.

We have two classes of documents that we are indexing through a bulk operation: one class is doing a simple index, and the other class is doing an update operation with an upsert with a groovy script.

We don't have any evidence of missing items with the first class of documents.

But we are missing a small (<1%?), but functionally noticeable percentage of documents in some of the shards. I just worked through a small set of example failures from yesterday. In each case, the document was in the replica, but not the primary. (We have three shards and one replica for these indices). (BTW this is a hack to get our updatable objects to work with ES)

This is the template of the bulk update action we are using:

{"update":{"_index":"<index name>","_type":"<doc type>","_id":"<id>"}}
{"upsert":<doc body>,"script":{"file":"update_incident","params":{"doc":<partial doc>}}}

This is the groovy script used to update the document.

// We always want the latest timestamp
if (doc.get('@timestamp') < ctx._source['@timestamp']) {
  doc.putAt('@timestamp', ctx._source['@timestamp'])
}
// Update the state
ctx._source.putAll(doc)
ctx._source.incident.count++

We have been doing this for over 18 months without issue (on 1.5, 1.6 and 1.7). It is only in the last month since the switch to Elasticsearch 2.x that we have run into consistency problems.

@rtkbkish are you still seeing this issue?

@colings86 Yes

@rtkbkish many many things changed in this area of the code. There are known problems, many of them has been fixed and some are on the process of being fixed. This is a problem we take seriously and we can work to figure out what exactly the issue in your case. Some questions:

1) Which version are you on today?
2) Do you see this same with 5.3.0?
3) Do you see any networking issues / disconnects in the logs?
4) What type of bulk request are you doing - is this update, normal index or indexing with auto generated ids?

  1. We are currently on 2.3.3
  2. We haven't made the jump to 5.3. The migration from 1.x to 2.x was so time-intensive that the next migration keeps getting delayed.
  3. Networking issues/disconnects are very rare. Does not explain frequency of issue.
  4. We generate the IDs. It is a bulk update operation.

Networking issues/disconnects are very rare. Does not explain frequency of issue.

Can we re-iterate on how often you see this? Also - do you see any shard failures?

We generate the IDs. It is a bulk update operation.

What kind of update do you do?

@bleskes
Sorry for the delay. Numerous local distractions.

I looked the logs over the past 4 months for our data nodes in this cluster (of which there are four).
This is the distribution of NodeDisconnectExceptions:

  67 2017-01-12
  14 2017-01-13
  20 2017-01-16
  10 2017-03-02
  53 2017-03-08
  45 2017-03-23
  60 2017-04-05
  59 2017-04-10
   7 2017-04-13
   3 2017-04-14
  55 2017-04-20
  12 2017-04-22

These were triggered by GC's on the data nodes. There is a very similar distribution of Shard create failures.

We see the missing docs much more frequently than this. Several times per day.

All updates are doing using the bulk API.

Originally, we used an upsert statement, but changed that to a create and update in an attempt to get around this issue. (It hasn't fixed things, so we should roll it back).

Current bulk statements:
{"create":{"_index":"","_type":"","_id":""}}
{}
{"update":{"_index":"","_type":"","_id":""}}
{"script":{"file":"update_incident","params":{"doc":{}}}}

Original Bulk statements:
{"update":{"_index":"","_type":"","_id":""}}
{"upsert":{},"script":{"file":"update_incident","params":{"doc":{}}}}

In either case we trigger a groovy script:

// We always want the latest timestamp
if (doc.get('@timestamp') < ctx._source['@timestamp']) {
  doc.putAt('@timestamp', ctx._source['@timestamp'])
}
// Update the state
ctx._source.putAll(doc)
ctx._source.incident.count++

Do you see active primary shard failures in your logs, and if so how often?

Since November 2016, I see two "primary failed while replica initializing" errors. both for the same index on Mar 22.

Otherwise clusters of "marking and sending shard failed due to [failed to create shard]" when we have GC incidents. These are tied to the NodeDisconnectExceptions in the previous comment.

I lean towards closing this as so much has changed in these regards since 2.x. Prior to 6.0, a replica could fall out of sync in the case of primary failure. In 6.0, we are introducing a primary/replica re-sync after a replica is promoted to primary. If this occurs again after 6.0 is released, we can revisit but at this point we are not going to make any changes in 2.x and 5.x to address.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rjernst picture rjernst  路  3Comments

Praveen82 picture Praveen82  路  3Comments

clintongormley picture clintongormley  路  3Comments

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments

rbayliss picture rbayliss  路  3Comments