I found ES 6.7/6.8 and ES 7.6 behave differently for docs.deleted counter.
But both look not correct or at least not easy to understand the logic
When performing DELETE <index>/_doc/<doc_id>, docs.deleted should increase 1.
docs.deleted shows 0 when deleting the document. (Clear up immediately)
# create index and put docs
DELETE my_test
PUT my_test
{"settings":{"number_of_replicas":0,"number_of_shards":1}}
PUT my_test/_doc/1
{"title":"aaa"}
PUT my_test/_doc/2
{"title":"bbb"}
# delete
DELETE my_test/_doc/1
# check docs.deleted
GET _cat/indices?v&index=my_test&h=index,health,status,docs.deleted
# response
index health status docs.deleted
my_test green open 0
When performing DELETE <index>/_doc/<doc_id>, docs.deleted should increase 1.
docs.deleted shows 2 when deleting a single document. (Increased wrongly.)
# create index and put docs
DELETE my_test
PUT my_test
{"settings":{"number_of_replicas":0,"number_of_shards":1}}
PUT my_test/_doc/1
{"title":"aaa"}
PUT my_test/_doc/2
{"title":"bbb"}
# delete
DELETE my_test/_doc/1
# check docs.deleted
GET _cat/indices?v&index=my_test&h=index,health,status,docs.deleted
# response
index health status docs.deleted
my_test green open 2
In 7.6, doc update will make docs.deleted increase 1, which is the correct and expected behavior (As the document is deleted and indexed again internally, even for partial update.)
But in 6.7/6.8, doc update doesn't trigger a docs.deleted increment, which appears to be another bug...
Pinging @elastic/es-distributed (:Distributed/CRUD)
Pinging @elastic/es-core-features (:Core/Features/CAT APIs)
The docs.deleted stats report on the segments in the index, and your test case does not do any refreshing so does not create any segments. Adding some appropriate refreshes recovers the expected behaviour in 6.x:
DELETE /my_test
# {
# "acknowledged": true
# }
PUT /my_test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
# {
# "shards_acknowledged": true,
# "acknowledged": true,
# "index": "my_test"
# }
PUT /my_test/_doc/1
{
"title": "aaa"
}
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "1",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "created",
# "_version": 1,
# "_seq_no": 0
# }
PUT /my_test/_doc/2
{
"title": "bbb"
}
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "2",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "created",
# "_version": 1,
# "_seq_no": 1
# }
POST /my_test/_refresh
# {
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# }
# }
DELETE /my_test/_doc/1
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "1",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "deleted",
# "_version": 2,
# "_seq_no": 2
# }
POST /my_test/_refresh
# {
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# }
# }
GET /_cat/indices?v&index=my_test&h=index,health,status,docs.deleted
# index health status docs.deleted
# my_test green open 1
#
In 7.x it's more complicated since we delete the document and then add a tombstone to record the deletion, and I think we count both of these as deleted docs. I think that makes sense, the tombstone is a genuine doc in the index that should be cleaned up later on once it's no longer needed for peer recovery, and indeed after thirty seconds (the peer recovery lease resync interval) and a flush I see that happen automatically:
DELETE /my_test
# at 2020-05-11T11:22:27.304Z
# {
# "acknowledged": true
# }
# at 2020-05-11T11:22:27.381Z
# (0.077211s elapsed)
PUT /my_test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
# at 2020-05-11T11:22:27.381Z
# {
# "shards_acknowledged": true,
# "acknowledged": true,
# "index": "my_test"
# }
# at 2020-05-11T11:22:27.632Z
# (0.250563s elapsed)
PUT /my_test/_doc/1
{
"title": "aaa"
}
# at 2020-05-11T11:22:27.632Z
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "1",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "created",
# "_version": 1,
# "_seq_no": 0
# }
# at 2020-05-11T11:22:27.699Z
# (0.066634s elapsed)
PUT /my_test/_doc/2
{
"title": "bbb"
}
# at 2020-05-11T11:22:27.699Z
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "2",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "created",
# "_version": 1,
# "_seq_no": 1
# }
# at 2020-05-11T11:22:27.718Z
# (0.018532s elapsed)
POST /my_test/_refresh
# at 2020-05-11T11:22:27.718Z
# {
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# }
# }
# at 2020-05-11T11:22:27.742Z
# (0.023456s elapsed)
DELETE /my_test/_doc/1
# at 2020-05-11T11:22:27.742Z
# {
# "_type": "_doc",
# "_primary_term": 1,
# "_id": "1",
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# },
# "_index": "my_test",
# "result": "deleted",
# "_version": 2,
# "_seq_no": 2
# }
# at 2020-05-11T11:22:27.755Z
# (0.013136s elapsed)
POST /my_test/_refresh
# at 2020-05-11T11:22:27.772Z
# {
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# }
# }
# at 2020-05-11T11:22:27.787Z
# (0.015387s elapsed)
GET /_cat/indices?v&index=my_test&h=index,health,status,docs.deleted
# at 2020-05-11T11:22:27.787Z
# index health status docs.deleted
# my_test green open 2
#
# at 2020-05-11T11:22:27.793Z
# (0.005123s elapsed)
### NOTE ≥30-second pause here
POST /my_test/_flush
# at 2020-05-11T11:22:58.937Z
# {
# "_shards": {
# "successful": 1,
# "total": 1,
# "failed": 0
# }
# }
# at 2020-05-11T11:22:59.037Z
# (0.1003s elapsed)
GET /_cat/indices?v&index=my_test&h=index,health,status,docs.deleted
# at 2020-05-11T11:22:59.037Z
# index health status docs.deleted
# my_test green open 1
#
# at 2020-05-11T11:22:59.041Z
# (0.003604s elapsed)
I've marked this for team discussion in order to contemplate whether we can make this behaviour any less surprising without compromising the fidelity of the stats.
Thanks @kunisen and @DaveCTurner. We can make the deleted count more consistent by excluding tombstone documents. However, I think we should instead explain in the documentation that the doc count and deleted count might include some 'system' documents.
If we had a parameter to reveal more information such as a further breakdown about what sub-counts per "type"/category of document comprise the overall count they are then perhaps that would make it more obvious to users, similar to how index stats show total count vs deleted count (at lucene level).
This might also be useful for users confused about some APIs counting Elasticsearch documents vs Lucene documents where nested documents show a higher Lucene document count for a smaller Elasticsearch document count.
If we had a detailed or breakdown type of parameter we could then just show the counts for each underlying group of documents be that top level/ES vs lucene vs system/tombstone documents perhaps, maybe depending on the context of API as well (eg split by ES vs tombstone/system/other OR split by ES vs Lucene if those contexts make the most sense depending on API)
Though if we did this wrong it might make things less clear, it might still be useful but as an undocumented expert setting perhaps.
echoing @geekpete
I did find some hint from nodes stats API.
It appears the "indices.docs.deleted" is actually from Lucene.

Maybe it's good to add that part to the API description breakdown page.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
Also, if we have any difference in between _cat/indices and nodes stats API, then I would say it might be good to also mention that too, along with adding the comment of "Tombstone" and "_flush".
The simple example I use for checking why doc counts differ depending on the api:
Doc count differences by API
#
# Why does doc count differ depending on API?
#
# using a variant of the nested docs example from the documentation can help highlight the difference
# between top level ES doc counts and lower level Lucene doc counts.
DELETE my_index
PUT my_index
{
"mappings": {
"properties": {
"user": {
"type": "nested"
}
}
}
}
PUT my_index/_doc/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
PUT my_index/_doc/2
{
"group" : "fans",
"user" : [
{
"first" : "Bob",
"last" : "Smith"
},
{
"first" : "Harry",
"last" : "White"
},
{
"first" : "Terry",
"last" : "Arthur"
}
]
}
# flush to ensure docs are searchable/counta
POST my_index/_flush
# index count api shows top level ES docs
# https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-count.html
GET my_index/_count
# CAT Indices API shows lower level lucene doc count, nested fields are stored as separate lucene docs.
# This behaviour is documented: https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html#cat-indices-api-desc
GET /_cat/indices/my_index?v
# CAT Count api shows top level ES docs:
# https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-count.html
GET /_cat/count/my_index?v
# Index stats shows Lucene doc count
GET /my_index/_stats?filter_path=indices.my_index.total.docs
For this example, maybe an additional tombstones count in there would be handy.
{
"indices" : {
"my_index" : {
"total" : {
"docs" : {
"count" : 7,
"deleted" : 0,
"tombstones" : 0,
}
}
}
}
}
and
health status index uuid pri rep docs.count docs.deleted docs.tombstones store.size pri.store.size
green open my_index 31yBbQgxTx6Nu2ObIJMThw 1 0 7 0 0 9.8kb 9.8kb
either with some optional parameter or by default,etc.
Like Nhat I'm also in favour of documenting the internal nature of docs.deleted rather than changing it or adding a more detailed breakdown which may constrain future work in this area. Lucene's tracking of deleted docs should IMO be considered a deep implementation detail. Users ought to rely on Elasticsearch keeping them under control in the background rather than trying to actively manage them, especially since the only tools to do so are rather blunt things like a force-merge, and I think that adding more detailed stats will encourage the opposite behaviour.
Thanks @DaveCTurner for the pointers!
Pinging @elastic/es-docs (>docs)
Most helpful comment
Like Nhat I'm also in favour of documenting the internal nature of
docs.deletedrather than changing it or adding a more detailed breakdown which may constrain future work in this area. Lucene's tracking of deleted docs should IMO be considered a deep implementation detail. Users ought to rely on Elasticsearch keeping them under control in the background rather than trying to actively manage them, especially since the only tools to do so are rather blunt things like a force-merge, and I think that adding more detailed stats will encourage the opposite behaviour.