Elasticsearch: Add support for date math on aliases name

Created on 7 Sep 2016  ·  24Comments  ·  Source: elastic/elasticsearch

Elasticsearch support date math in the index name like

POST  <es_security_00365-{now%2Fd%2B1d}-0>  

It will create an index for the next day es_security_00365-2016.09.06-0
but if we use the date math when we create an alias it doesn't work

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "es_security_00365-2016.09.08-0", "alias" : "<es_security_00365-{now+1d}>" } }
    ]
} 

You will get

{
   "es_security_00365-2016.09.08-0": {
      "aliases": {
         "<es_security_00365-{now+1d}>": {}
      }
   }
}

If elasticsearch support date math for alias name you will get

{
   "es_security_00365-2016.09.08-0": {
      "aliases": {
         "es_security_00365-2016.09.08": {}
      }
   }
}

Relates to https://github.com/elastic/elasticsearch/issues/5359

:CorFeatureIndices APIs >enhancement CorFeatures help wanted

Most helpful comment

I have an actual use case in which this might shed some more light on why this might be an increasingly important feature.

I have a machine instance where there are a number of beats running (metric, audit, file, & heartbeat) all passing through Logstash. Per the Elasticsearch documentation, I am using the the index-per-day pattern coupled with the beat name and version which results in 4 different indices.

"foo-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" results in indices like:

foo-metricbeat-6.2.4-2018.04.23
foo-auditbeat-6.2.4-2018.04.23
foo-filebeat-6.2.4-2018.04.23
foo-heartbeat-6.2.4-2018.04.23

While I could merge them all into a single index (not recommended per the documentation), only some of this data needs to be saved long term like foo-audit while other data like foo-heart & foo-metric can be deleted completely after X amount of time.

What I'd like to be able to do is something like:

{
 "actions": [
    { "add" : { "index" : ["<foo-metricbeat-6.2.4-{now/d}>", "<foo-auditbeat-6.2.4-{now/d}>", "<foo-filebeat-6.2.4-{now/d}>", "<foo-heartbeat-6.2.4-{now/d}>"], "alias" : "<machine-instance-{now/d}>" } }
 ]
}

This would allow me the flexibility to create aliases for a limited X amount of time, while also keeping and culling the other indices where/when needed.

This, on the face of it, this seems like a way easier for users to manage then setting up the Rollover Pattern mentioned by Clinton which is using Cron jobs to check the status before kicking off a rather robust pipeline. That blog post makes mention of Curator which is supposed to help that pipeline but didn't make it past Elasticsearch 5.5.

Perhaps I'm wrong in my thinking but it seems like the date math is an easier, logical, first step before graduating to the Rollover pattern.

All 24 comments

I'm not sure about doing this. First I don't want to complicate adding aliases. Second we could introduce unexpected behaviour such as when running this command at midnight, ending up with the alias and index having different days.

With the new rollover-and-shrink pattern for indexing, the user decides when to check whether the index should be rolled over, and gets the new index name as a response. This index name can then be manipulated client side to be used for the alias.

Another possibility is adding a cluster-state changes API, so that users can be alerted when (eg) an index is created and take the appropriate action.

These options seems a lot more flexible than adding date math to alias names.

Discussed in Fix it Friday - for consistency's sake we should add support for date math to aliases. Need to ensure that a single now value is used for the whole request.

@clintongormley
does aliases in tempate should support this feature?
I checked the code and try to fix this in this way for create index:

  1. each time try to resolve indexname with context
  2. then resolve aliases with the same context, which has now value

but this aproach is hard to support aliases in template

Aliases in templates should support this feature.

We could really use this functionality. Currently we have to have a check to create an index mapping and then insert into it, because we can't use templates.

We really need this. It would be great to have this ability as it sure would clean up client code quite a bit as we have to do a bunch of work just to create an index (we can't ensure that indexes are created in a sequential timeline)

cc @Mpdreamz @jasontedor @nik9000 Would be great if this could be discussed this week 🍻!

Would be great if this could be discussed this week 🍻!

Sorry, what is it that you think needs to be discussed here?

@jasontedor I was referencing the what to work on discussions following elasticon :). I really wish this functionality was implemented, Rollover functionality doesn't solve or fit this use case.

I was referencing the what to work on discussions following elasticon :).

Do not fear, we have plenty to work on after Elastic{ON}. 😄

When used in an index template, which {now} would be used? Server's OS clock or the @timestamp of the document that is being indexed?
We have a situation where we would like to create aliases per day on indices that might be per hour (very different amounts in different environments). Problem is, the timestamps aren't guaranteed to be "today".

(A substring match solution, something like #5359, would probably be more robust but that issue was closed in favor of this one.)

@TommySedin it would be the server's clock. If you want to use the document's timestamp, then you can change the index name using an ingest pipeline. That said, I don't think that trying to ensure that all documents belong to the exactly correct index is worth worrying about. You can use a query to figure out whether a particular index is OK to delete or not, and the same thing for which documents to include in your results.

I'd even just like to be able to use the date from the specified index name :).

@clintongormley In our largest environment, we have about 30 TB of live data in ~7 billion documents (transaction events). The frontend GUI that the 1st-line uses to follow the transactions runs searches on specific date-based indices to limit the sheer number of shards that has to be queried. Having all these documents end up in respective index is pretty important.

Preferably we'd use the Rollover API to handle the different rollover periods, in combination with Index Templates to create indices and aliases that somehow would make it all transparent to the GUI frontend.

I'm pretty sure we'll be able to build something using Ingest and having the frontend use wildcards, but it's neither as clean, nor as dynamic a solution.

This feature request is an interesting idea but since its opening we have not seen enough feedback that it is a feature we should pursue. We prefer to close this issue as a clear indication that we are not going to work on this at this time. We are always open to reconsidering this in the future based on compelling feedback; despite this issue being closed please feel free to leave feedback on the proposal (including +1s).

After some more internal chat w/ teammates, we are going to leave this issue open for now.

Pinging @elastic/es-core-infra

Thank you!

Thanks
-Blake Niemyjski

On Wed, Mar 21, 2018 at 7:51 AM, Elastic Machine notifications@github.com
wrote:

Pinging @elastic/es-core-infra
https://github.com/orgs/elastic/teams/es-core-infra


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/elastic/elasticsearch/issues/20367#issuecomment-374925587,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA-So-Dg0nrJIIDFZJXbTBg4wm5yIObIks5tgkzbgaJpZM4J25_C
.

I have an actual use case in which this might shed some more light on why this might be an increasingly important feature.

I have a machine instance where there are a number of beats running (metric, audit, file, & heartbeat) all passing through Logstash. Per the Elasticsearch documentation, I am using the the index-per-day pattern coupled with the beat name and version which results in 4 different indices.

"foo-%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" results in indices like:

foo-metricbeat-6.2.4-2018.04.23
foo-auditbeat-6.2.4-2018.04.23
foo-filebeat-6.2.4-2018.04.23
foo-heartbeat-6.2.4-2018.04.23

While I could merge them all into a single index (not recommended per the documentation), only some of this data needs to be saved long term like foo-audit while other data like foo-heart & foo-metric can be deleted completely after X amount of time.

What I'd like to be able to do is something like:

{
 "actions": [
    { "add" : { "index" : ["<foo-metricbeat-6.2.4-{now/d}>", "<foo-auditbeat-6.2.4-{now/d}>", "<foo-filebeat-6.2.4-{now/d}>", "<foo-heartbeat-6.2.4-{now/d}>"], "alias" : "<machine-instance-{now/d}>" } }
 ]
}

This would allow me the flexibility to create aliases for a limited X amount of time, while also keeping and culling the other indices where/when needed.

This, on the face of it, this seems like a way easier for users to manage then setting up the Rollover Pattern mentioned by Clinton which is using Cron jobs to check the status before kicking off a rather robust pipeline. That blog post makes mention of Curator which is supposed to help that pipeline but didn't make it past Elasticsearch 5.5.

Perhaps I'm wrong in my thinking but it seems like the date math is an easier, logical, first step before graduating to the Rollover pattern.

This feature sounds interesting as we would like to handle automatic index creation when a new document insert needs it.
At insertion time, Elasticsearch looks for the _template to assign the schema, shard split configuration, etc. We would like to set a date driven alias name for inserts, searches and so the rollover name, which depends on the alias name too.

Is there any update on this feature implementation.

I too have a use for this where a data source is injecting indexname-{YYYY-MM-DD} indicies into elastic and i'd like to apply ilm policies to these including rollover as the quantity of data being ingested is too large for a single index.

I wonder whether there is still a strong need for this now that we support ILM and rollover, which automate management of aliases on time-based indices.

I wonder whether there is still a strong need for this now that we support ILM and rollover, which automate management of aliases on time-based indices.

Given that ILM is part, only, of the on-prem (Community Edition) and elastic.co fully managed solutions, there can be technology restrictions in a project that could impede using a version compatible with ILM. That said, it would be helpful to have alternatives depending on the needs.

We have a use case in which we need to set up an alias using date math.
Currently we're creating an index by hitting a proxy (which will modify the index name by adding a prefix and a suffix). Once ES returns the actual index name in the response, we set up an alias to the newly created index.

When using date math support ES would receive these events

# now: Jan 2021
# creates test_events-2020-12-01-0
PUT /<events-{now/M-1M{yyyy-MM}}>

#set up a test_events-2020-12-01 alias
POST /test_events-2020-12-01-0/_alias/<events-{now/M-1M{yyyy-MM}}>

#which yields
{"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [<events-{now/M-1M{yyyy-MM}}>]: must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [<events-{now/M-1M{yyyy-MM}}-01>]: must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"},"status":400}%
Was this page helpful?
0 / 5 - 0 ratings

Related issues

DhairyashilBhosale picture DhairyashilBhosale  ·  3Comments

ppf2 picture ppf2  ·  3Comments

brwe picture brwe  ·  3Comments

clintongormley picture clintongormley  ·  3Comments

rpalsaxena picture rpalsaxena  ·  3Comments