Elasticsearch: Clarification in docs about max_size for rollover API

Created on 29 Jan 2019  路  14Comments  路  Source: elastic/elasticsearch

In the docs for rollover api it is not clear that max_size refers to the primary store size. While it makes sense when you think about it a bit further (because one of the main uses of rollover API is to limit the size of shards, and the total store size can vary greatly depending on the number of replicas), it would be useful to have this noted explicitly here to avoid confusion.

:CorFeatureILM+SLM :CorFeatureIndices APIs >docs CorFeatures Docs

All 14 comments

Pinging @elastic/es-docs

Pinging @elastic/es-core-features

@dakrone - the text now reads:

max primary shard index storage size

There has now been some confusion whether this means

  • the sum of the sizes of all of the primary shards in the index
    or
  • the size of the single largest primary shard in the index

Could you please clarify?

I would like to see us remove the concept of shards from the description of rollover in both rollover API and ILM documentation. This is confusing to end users because we often use 50Gb for max_size in our default ILM policies (Logstash, Beats) across stack products, and 50Gb is traditionally the max size _per shard_ we recommend in the field.

It is also confusing today because in some places in our documentation (esp. in the ILM documentation), we simply say max_size is the index size. But in other places, we use the description of the "max size of the primary shard".

Here are all the places I can find that describe what max_size is in our documentation today:

https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html
image
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/_actions.html#ilm-rollover-action
image
image
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/_full_policy.html
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/using-policies-rollover.html
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/index-lifecycle-management.html
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/set-up-lifecycle-policy.html
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/applying-policy-to-template.html
image
https://www.elastic.co/guide/en/elasticsearch/reference/master/using-policies-rollover.html
image

Ideally, I would like to see us use a consistent description of max_size across the documentation sets. I think it will be helpful if we change the description of max_size to something like "total index size (excluding replicas)", or if we really want to use the notion of shards ... "total size of all primary shards in the index", etc.. :)

+1

+1
I really think this should be aligned as it is very confusing and inconsistent. We just wrecked our brains about what is now exactly valid for max_size in the ILM policy's rollover action.

I would expect that max_size reflects the maximum size of all the primary shards within an index (summed up) -> replica-storage should not be counted into the max_size.
From our tests I believe this is the case how it behaves today (however, I might be wrong).

@sderungs If I understand your post correctly, you are right: max_size is compared to the sum of the sizes of all primary shards, and replicas are not counted towards max_size. So you have an index with number_of_shards: 3, number_of_replicas: 1, and each shard is 50G, the number compared to max_size would be 50G + 50G + 50G = 150G.

@gwbrown Yes, that is exactly my understanding - glad to have it finally understood :)

In this case I think at least "Table 30. conditions parameters" should be phrased clearer or be backed by an example.

+1 for this.

Just spent more time then I'd like to admit trying to determine if "max primary shard size" meant the maximum of the primary shards or the sum of all.

Even just making the reference plural, while not the best case, could have done wonders for the understanding. "Estimated size of the primary shards." The plural shards would've gotten me in the right direction much more quickly.

[docs issue triage]

Leaving open. This is still relevant.

+1 for this. and maybe example API call to determine index size by primary shard only could be helpful.

I agree with @danielyahn, showing how to check the size when we make this change would in a good idea. The size of primary shards in an index can be viewed using the _cat indices API, see the pri.store.size column.

Another user here that has been confused with the max_size parameter on the rollover action. As it is not, it麓s hard to understand if this is the maximum size per shard, or the maximum shard of the sum of the index Primary shards.

From the PR (https://github.com/elastic/elasticsearch/pull/56561)

max_size :: Triggers roll over when the index reaches a certain size.
This is the total size of all primary shards in the index.
Replicas are not counted toward the maximum index size.

Yes - thank you! :+1:

Was this page helpful?
0 / 5 - 0 ratings