Elasticsearch: Index last update date

Created on 10 Sep 2015  Â·  22Comments  Â·  Source: elastic/elasticsearch

I was looking for a way to get last index modification approximate date and created a topic on discussion list: https://discuss.elastic.co/t/index-last-update-date/28838.

Can ES expose last modification date of segment files?

Most helpful comment

This requires a master operation which would be entirely too expensive on every indexing request that mutates the index.

This could be a value that is held per shard, and only reduced to a global max when the request is made, no?

All 22 comments

What is your use-case exactly? For instance a segment file could be very recent even if no documents were indexed if a shard was relocated.

Yeah I know that this may be an issue when relocation happens. The use case is that I'd like to know if I can force optimize if the index is not active for long enough time. This is doable when I add update timestamp to each document which brings significant overhead and requires reindexing. This is also doable when I have an approximate time of last modification of index files. The other option is to expose all segments hash tags so I can compare if index changed or not (keeping old hash tags outside of ES).

Any other thought?

I'm not sure I follow exactly what you want, but we have a section on the shard stats, which exposes information about the latest lucene commit point - which will indicate if something changed as well (i.e., we ES flushed or a merge finished) . See https://github.com/elastic/elasticsearch/pull/10687

Hmmm... sounds great. Didn't know about this. Thanks for information.

great. Closing then....

I'm wondering if this should be reconsidered. It would be very useful to have last updated in index metadata. Especially with rollover indices in play now, curator could make use of this stat optionally as an alternative to field_data when it's not available or not reliable due to backfilling.

An additional use for this would be for folks that want to only apply some action if the index has changed in a certain time range (like copy a backup to another cluster).

It would be very useful to have last updated in index metadata.

This requires a master operation which would be entirely too expensive on every indexing request that mutates the index.

An additional use for this would be for folks that want to only apply some action if the index has changed in a certain time range (like copy a backup to another cluster).

Snapshots are incremental which means they play nicely with taking periodic snapshots. If an index has not changed since the last snapshot, no new data will be transferred.

This requires a master operation which would be entirely too expensive on every indexing request that mutates the index.

This could be a value that is held per shard, and only reduced to a global max when the request is made, no?

We already track whether a shard is active or not (shards become inactive after 5m of no writes). For that we track the nano time of the last write in the engine. I think we can expose those in the stats in the form of last time since write? Note that this is not 100% the same as it is reset during relocation of shards, but it might be good enough and simple to implement.

On 20 Feb 2017, at 14:57, Clinton Gormley notifications@github.com wrote:

This requires a master operation which would be entirely too expensive on every indexing request that mutates the index.

This could be a value that is held per shard, and only reduced to a global max when the request is made, no?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.

For that we track the nano time of the last write in the engine.

For this we do not track absolute time, only relative time.

For this we do not track absolute time, only relative time.

Yes that's what I mean with "I think we can expose those in the stats in the form of time since last write" (although I garbled that sentence with an edit).

The use case that brought this on is this - "Has the index been updated in the last x minutes?, if so copy it".

The use case that brought this on is this - "Has the index been updated in the last x minutes?, if so copy it".

Do you mean run a snapshot?

A cron trigger of running a snapshot every x minutes achieves the same.

@jasontedor there are other use cases, eg if index A, B, or C have changed since $last_built, then rebuild the composite index X

there are other use cases

@clintongormley Yes, I completely agree. 😄

It's important that we understand what those are so that we know whether or not they can be solved already today (like snapshotting), whether or not we need to provide a solution that is robust in the face of relocations, and whether or not we need to provide a solution that uses an absolute clock versus a relative clock.

The use case is for a "near-realtime sync" between 2 indices in different clusters. I guess this will be solved eventually with changelog, but for now, we could run snapshot/restore every single minute, but that seems inefficient. It would be better if we could say "Something has changed, run the snapshot/restore process"

@PhaedrusTheGreek For that use case, a solution that is not robust in the face of relocations will not work. If an indexing request hits, then the shard is relocated, and no future indexing requests arrive, the shard will not be seen as changed.

@jasontedor

Another use case:

"If the index has update since last time, run the client query again otherwise return the results back from cache"

Is this data available now via the commit user_data? I see a timestamp in shard stats called max_unsafe_auto_id_timestamp, but I'm not sure if that helps to indicate the last write time of the shard.

No, that is not the last write time of the shard. It's roughly the last timestamp of when a retry occurred, it's deeply internal, for optimizations that we do inside the engine related to append-only indices.

Any solution for getting the last update time for indices? In the _cat/indices API we are getting the creation date and not able to get the last update time. Waiting for any reply on this.

@biswajit-landmarkgroup There is not a built-in solution for this. If your data has a timestamp field, and that timestamp field is roughly equal to the time that the data is ingested, then you could do an aggregation on the an index on the timestamp field to find the approximate last time of a write to an index.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dawi picture dawi  Â·  3Comments

abtpst picture abtpst  Â·  3Comments

DhairyashilBhosale picture DhairyashilBhosale  Â·  3Comments

rjernst picture rjernst  Â·  3Comments

martijnvg picture martijnvg  Â·  3Comments