Elasticsearch: Should we warn users when they look for data that is older than the retention period?

Created on 10 Jun 2020  路  9Comments  路  Source: elastic/elasticsearch

It's quite common that different people are responsible for configuring data ingestion and actually analyzing the data. How can analysts tell whether they cannot find events older than X months because this is the retention period of your data or just because no events match the current filter and are older than X months?

Some cases make this even more trappy. For instance think of a user searching for sequences of event of category X then Y. If X and Y have different retention periods then it's easy to be misleaded to think that there is old data when actually there is only old data for one of the categories.

Some questions to get the discussion started:

  • Should we warn users on all date(_nanos) fields or only the timestamp field of data streams?
  • What should happen when searching across data streams that have different retention periods?
  • Should we ignore indices that are filtered out by the can_match phase? (with the caveat that the can_match phase may filter indices based on their @timestamp values)
:SearcSearch >enhancement Search team-discuss

Most helpful comment

Thanks for the ping. Since the idea of adding this information to _field_caps seems to be getting traction, I'm assigning the search team.

All 9 comments

Pinging @elastic/es-search (:Search/Search)

@jpountz how are you defining "retention period" here? ILM policy delete phase?

@dakrone Yes indeed.

Hmm.. what about a policy like this:

{
  "policy": {
    "phases" : {
      "hot" : {
        "min_age" : "0ms",
        "actions" : {
          "rollover" : {
            "max_docs" : 10000000,
            "max_size": "50gb"
          }
        }
      },
      "delete" : {
        "min_age" : "1d",
        "actions" : {
          "delete" : { }
        }
      }
    }
  }
}

We would have to be careful not to warn the that they shouldn't look for data past one day, because deletion is based off of the rollover time, so the index could be a month old even though their delete retention is one day

@dakrone I think you're bringing a good question, but it's not obvious to me that we should not warn though as the fact that data exists is a bit accidental. I'm thinking of the case of someone who experiments with a query with the goal of turning it into an alerting rule at some point. If there is data just because we're "lucky", wouldn't it better to warn users so that they don't accidentally create rules that might not see all the data that they expect to see?

We have some discussions about this yesterday and the following questions were raised:

  • What if a user wants to query the entire range of data, should we warn them in such a case?
  • What if a data stream stores outdated documents (documents that are older than the index creation date)?

@tomcallahan brought up the idea that maybe this shouldn't be about warning users, but instead we should enable Elasticsearch to return information about the retention period for a given index pattern. This would allow Kibana to tailor its UI for this retention period and e.g. give signs that filtering data from the "Last 90 days" isn't right if the data has a retention period of 30 days.

To move this forward we agreed to gather more feedback from Solutions to see whether this is something they already considered.

++ for exposing this and letting Kibana decide how to show it

@jpountz since this has two area labels, which team should take ownership of this, the search team or the core/features team?

Thanks for the ping. Since the idea of adding this information to _field_caps seems to be getting traction, I'm assigning the search team.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rjernst picture rjernst  路  3Comments

rpalsaxena picture rpalsaxena  路  3Comments

ppf2 picture ppf2  路  3Comments

ttaranov picture ttaranov  路  3Comments

makeyang picture makeyang  路  3Comments