Kibana: [Infra UI] Config Settings

Created on 3 Jul 2018  路  9Comments  路  Source: elastic/kibana

Task Breakdown

  • [x] add adapter and api to read sources from config (PR #20789)
  • [x] move map() query type inside of source() (PR #21016)
  • [x] use source configuration for log fetching (PR #21306)
  • [x] add api field to diagnose matching indices (PR #21016)
  • [x] add api field to communicate detected data characteristics (Issue #21422)

Defaults

The lack of any configuration options specific to the Infra UI should be treated as if the following configuration was present:

xpack.infra:
  sources:
    default:
      metricAlias: 'metricbeat-*'
      logAlias: 'filebeat-*'
      fields:
        message: 'message'
        host: 'beat.hostname'
        pod: 'kubernetes.pod.name'
        container: 'docker.container.name'
        timestamp: '@timestamp'
        tiebreaker: '_doc'
  query:
    partitionSize: 75
    partitionFactor: 1.2

Details

xpack.infra.sources.default

This is the default source. Any additional sources defined here will be ignored for now until the UI offers facilities to switch between sources (#20662).

xpack.infra.sources.default.fields.tiebreaker

If we assume the data to be only filebeat data, the offset field might also be appropriate. Otherwise _doc would be relatively reliable default (even more so once elastic/elasticsearch#25674 is fixed).

xpack.infra.query.partitionSize

The size of the partitions for the nodes aggregation.

xpack.infra.query.partitionSize

To get 75 requests per partition it's necessary to request 20% more to get a complete set due to how nodes are distributed across shards.

Future Tasks

  • Add settings for dashboard drilldown (see elastic/kibana#21884 for details)
Metrics UI logs-metrics-ui week

Most helpful comment

With the desire in mind to allow for multiple sets of source configurations, how about something like

xpack.infra:
  sources:
    default:  # this would be the name of the group
      metricAlias: 'xpack-infra-default-metrics'
      logAlias: 'xpack-infra-default-logs'
      fields:
        message: 'message'
        hostname: 'beat.hostname'
        pod: 'kubernetes.pod.name'
        container: 'docker.container.name'
        timestamp: '@timestamp'
        tiebreaker: '_doc'  # or 'offset'?
  query:
    partitionSize: 75
    partitionFactor: 1.2

All 9 comments

With the desire in mind to allow for multiple sets of source configurations, how about something like

xpack.infra:
  sources:
    default:  # this would be the name of the group
      metricAlias: 'xpack-infra-default-metrics'
      logAlias: 'xpack-infra-default-logs'
      fields:
        message: 'message'
        hostname: 'beat.hostname'
        pod: 'kubernetes.pod.name'
        container: 'docker.container.name'
        timestamp: '@timestamp'
        tiebreaker: '_doc'  # or 'offset'?
  query:
    partitionSize: 75
    partitionFactor: 1.2

Nit: Could we call host: hostname: in the above? I think these are 2 different things.

@weltenwort I think we should go with your proposal

ok, I'll edit the issue description to represent the current state

Another aspect I'm deliberating right now is how the configuration is communicated between client and server.

Most queries require knowledge about a specific data source configuration by the server. There are (at least) two possible ways in which they can be made available to the server:

  1. All required configuration is submitted by the client to the server as part of the query arguments. That has several implications:

    • The set of query arguments can become quite large.
    • The browser has to know all required configuration.
    • The browser can submit (dynamic) configuration not known by the server, i.e. from a source other than the server (e.g. local storage). That means that the configuration can diverge between different long-living tabs of the app.
  2. A unique identifier of the configuration set is submitted by the client to the server as part of the query arguments. Implications:

    • The set of query arguments related to the data source stays small.
    • The browser does not have to know all configuration required to perform the query.
    • The browser can only choose between the configuration options known to the server, e.g. from the server's config file. As such these can be created and updated centrally.

I will go with variant 2 for now and use 1 as a fallback if I encounter too many obstacles.

We also need to take into account #21884

Because there have been questions from several sides about the reasons for this config structure, I'll try lay that out below.

Rationale

One strength of the Elastic Stack is the flexibility it exhibits to allow users to integrate it into their own infrastructure and adapt it to their needs. To stay true to that spirit, the Infra UI will try to provide several configuration settings. For the first phase we opted for static configuration in the kibana.yml, because it is easy to implement and powerful enough for many use cases.

When interviewing users from the target group, we were consistently told that they often need to partition the logs and metrics into separate groups that correspond to sections of their infrastructure and/or teams in their organisation. That is why the configuration is structured such that the settings that relate to the consumption of data are grouped into "sources".

For each "source" the configuration must specify which indices the logs and metrics are read from. This is done by specifying a read alias for each type of data (logs and metrics). Using aliases instead of plain index patterns enables the easy implementation of a simple configuration UI that can add indices to that source as well as easy self-enrollment of data sources (e.g. metric-/filebeat via their index template).

To increase the chance that users can deploy this to handle their existing data, a few salient fields like the timestamp field and fields identifying various different entities like hosts and containers can be configured.

To provide an easy "getting started" experience, all the settings are optional. If the user does not specify any source, a "default" source as described in the section "Defaults" is assumed.

Should we switch to using the fields host.name and container.id as specified in the ECS instead of the beat-specific fields? That would be more consistent with how other apps like APM, which becomes especially important once we implement linking between them.

I would probably stick to beat.hostname for the 6.x release and change it over to host.name in 7.0. Reason is that in this case the UI would work with Beats data also older then 6.4.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tbragin picture tbragin  路  3Comments

MaartenUreel picture MaartenUreel  路  3Comments

ctindel picture ctindel  路  3Comments

timroes picture timroes  路  3Comments

bradvido picture bradvido  路  3Comments