Elasticsearch: Compact index when closing

Created on 23 May 2019  路  10Comments  路  Source: elastic/elasticsearch

A spin-off from #33888.

Should we trim/clean translog and force-merge when closing an index? These actions can be done via the verifying-before-close step. Another option is to integrate these actions with ILM.

Should we also enforce a single commit when closing? This property does not always hold for follower indices and primaries with ongoing peer recoveries.

Relates #33888

:DistributeDistributed

Most helpful comment

Should we trim/clean translog and force-merge when closing an index? These actions can be done via the verifying-before-close step.

I definitely don't think we should do this by default. Imagine having a cluster that's experiencing heavy load, so badly that you need to close some indices to bring it back from the edge. If we were to force-merge when an index was closed, that introduces more load into the cluster (not to mention a potential wait time depending on the number of segments we force merge to) when it is in a precarious state.

All 10 comments

Pinging @elastic/es-distributed

Pinging @elastic/es-core-features

Should we trim/clean translog and force-merge when closing an index? These actions can be done via the verifying-before-close step.

I definitely don't think we should do this by default. Imagine having a cluster that's experiencing heavy load, so badly that you need to close some indices to bring it back from the edge. If we were to force-merge when an index was closed, that introduces more load into the cluster (not to mention a potential wait time depending on the number of segments we force merge to) when it is in a precarious state.

We discussed this today as a team and agreed with @dakrone about not force-merging while closing an index because this would make closing an index far too heavyweight an operation. The discussion also touched again on the idea of being able to force-merge a read-only index (#41624):

  • one might want to force-merge a closed index by re-opening it, force-merging it, and closing it again, but today we can't guarantee that nothing else is indexed into it during that process.

  • ILM may also want to be able to block writes to an index before force-merging it.

However, we concluded that trimming the translog on close is a reasonable thing to do:

  • we'd like a closed index to consume as few resources as possible, and a translog can consume considerable disk space

  • trimming the translog is a fairly lightweight operation

  • the stats on a closed index make it look like there is no translog anyway, even if it is present and consuming disk space

  • if a closed shard copy is moved elsewhere then the resulting copy has no translog

  • keeping the translog around might occasionally help recover an out-of-sync replica with an operations-based recovery when the index is re-opened, but this is a pretty rare situation and not one we felt to be important

We noted that it may not be trivial to trim the translog at close, because there may be something still holding onto the generations that we want to trim (e.g. an ongoing peer recovery).

ILM may also want to be able to block writes to an index before force-merging it.

ILM does this automatically currently.

@DaveCTurner I think after the discussions this can drop the ILM tag, is that right? Since we would automatically trim the translog on close?

Yes, I think so. ILM's force-merge action sets index.blocks.write first (but not index.blocks.read_only).

keeping the translog around might occasionally help recover an out-of-sync replica with an operations-based recovery when the index is re-opened, but this is a pretty rare situation and not one we felt to be important

just for clarification, we don't use the translog anymore for primary replica sync, right? do you mean something else?

Today we read operations from the translog on the primary during peer recovery. For a while we had moved to reading them from Lucene but we reverted that in #38904.

@tlrx can this be closed now?

Translog files are now trimmed for closed indices (#43156) and translog stats are now correctly exposed (#43752, #43825).

Was this page helpful?
0 / 5 - 0 ratings