Micrometer: Elastic metrics: The 7.3 have changed the meaning for _source enabled false

Created on 3 Oct 2019  Â·  8Comments  Â·  Source: micrometer-metrics/micrometer

From Elastic's documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/7.3/mapping-source-field.html

The metrics use case is distinct from other time-based or logging use cases in that there are many small documents which consist only of numbers, dates, or keywords. There are no updates, no highlighting requests, and the data ages quickly so there is no need to reindex. Search requests typically use simple queries to filter the dataset by date or tags, and the results are returned as aggregations.
In this case, disabling the _source field will save space and reduce I/O.

This means that it is not possible to see the fields in the index and Kibana view will not show the metrics.

I would make a pull request to change to enabled: true for Elasticsearch 7.3 and up

elastic

Most helpful comment

another option would be to make a configuration property for whether to enable the _source field,

I like options

All 8 comments

Thank you, Lars, for offering to make a PR.
This Stack Overflow answer also concludes that setting enabled: true would solve the issue of not seeing metrics in Kibana. I think this would be a quick-fix until there is native support for Elastic APM as requested in https://github.com/micrometer-metrics/micrometer/issues/793

The quoted section of the documentation explains why disabling the _source field is a good idea for metrics, but you're proposing enabling it.

Just so we're on the same page, the fields are there (you can check them in the Kibana index pattern, for instance) and you can query/aggregate them, but they don't show the raw values in the Kibana Discovery page for each document. I can see how this makes ad-hoc querying the metrics difficult unless you know the metric names/tags already. It would be nice if Kibana could still auto-complete the field names/values for this purpose.

The proposal to enable the _source field would remedy this pain, but is it worth everyone paying the cost of including the _source field for metrics data? I'm not sure how much that cost is, to be honest. I wonder if there is some other way we can enable easier querying of the data without the overhead of enabling the _source field.
@xeraa sorry to ping you suddenly, but I'd love to get any thoughts from you on the above, if you have time to help.

Other Elasticsearch/Kibana users also feel free to give your input on this, as the proposed change would affect the default behavior for all users of the Micrometer Elasticsearch registry.


The above being said, there is the workaround currently of specifying your own template that uses "_source": { "enabled": true }. Even if we decide keeping the current default is best, another option would be to make a configuration property for whether to enable the _source field, so users don't have to copy the template just to modify that.

Also, I want to point out that recently, the following dashboards/visualizations of Micrometer metrics in Kibana have been published by some users.
https://github.com/acroquest/micrometer-kibana-dashboard

What's the impact?

To make the impact of this explicit. It might be obvious already, but just to put everyone on the same page.

If you have the following mapping and 3 sample docs:

PUT metrics
{
  "mappings": {
    "_source": {
      "enabled": false
    }
  }
}

POST metrics/_doc
{
  "some-long": 200,
  "some-float": 0.9,
  "timestamp": "2019-12-11T12:10:10"
}
POST metrics/_doc
{
  "some-long": 400,
  "some-float": 1.2,
  "timestamp": "2019-12-11T12:10:20"
}
POST metrics/_doc
{
  "some-long": 240,
  "some-float": 1.0,
  "timestamp": "2019-12-11T12:10:30"
}

A search like GET metrics/_search will only give you an output like this:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "metrics",
        "_type" : "_doc",
        "_id" : "LA8N9m4BscjDut7uOBBz",
        "_score" : 1.0
      },
      ...

And Discover will show the same:

Screenshot 2019-12-11 at 21 43 27

But Visualizations (and thus Dashboards) will work just like normal:

Screenshot 2019-12-11 at 21 43 47

Is this the right tradeoff? I don't know :)
It might depend on the use-case, so making it configurable could make sense (potentially pointing to the right index template in the docs)? My personal feeling is that you want to keep this set to true by default and more experienced users with larger datasets can change it to false if that is what they want.

How much disk will this save?

As always — it depends. Number of fields, their mapping, how well the data can be compressed,... will all play a role.

Do you have a representative dataset? Then we could easily try it out.

Which Elasticsearch versions support this?

I would make a pull request to change to enabled: true for Elasticsearch 7.3 and up

This feature has been around for a long time. I'm not sure what version range of Elasticsearch you support, but you could probably enable this for all versions (for example see the docs for 5.0).

PS: Rollups

Maybe the solution could also include rollups? Basically it takes the raw data and pre-aggregates it. So you could have 10s intervals for today, but after 48h you only keep 5min intervals around; IMO that would save a lot more disk space (but add different tradeoffs).

PPS: Compression

As mentioned in the docs, you could also look into compression if you're not doing that already.

another option would be to make a configuration property for whether to enable the _source field,

I like options

BTW some other ideas for further reducing storage requirements:

  • Pick the right numeric datatype, for example integer or short instead of the default long; or instead of double either float or even scaled_float.
  • Set "index": false on numeric values, so you can still aggregate on a value, but not search or filter. This is also doable through an index template
  • Strings should probably only be keyword if you're not already doing that.

The description states that this is the behaviour in Elasticsearch 7.3 and up, but if you check the docs this is the behavior in any Elasticsearch version. Don't see what is introduced in terms of _source in Elasticsearch 7.3?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wilkinsona picture wilkinsona  Â·  5Comments

wilkinsona picture wilkinsona  Â·  3Comments

pjfanning picture pjfanning  Â·  3Comments

jonatan-ivanov picture jonatan-ivanov  Â·  3Comments

jkschneider picture jkschneider  Â·  3Comments