Kibana: Remove index pattern mapping cache

Created on 10 Mar 2016 · 38Comments · Source: elastic/kibana

Currently we cache a normalized view of the Elasticsearch mapping because large mappings are expensive to parse. This causes lots of other problems, for example when a user adds a field we don't see it unless they manually refresh the mapping cache, which is a non-obvious task. Basically any time the mapping changes, it becomes painful for the user: https://github.com/elastic/kibana/issues/2236, https://github.com/elastic/kibana/issues/6362

And its not just mapping changes, we still have issues with parsing the large mappings in the first place, the more indices the user has, the longer it takes. Thats why we do stuff like restrict the parsing to 5 indices by default. https://github.com/elastic/kibana/issues/1540, https://github.com/elastic/kibana/issues/2928

The crux of the issue goes back to the pre-beta1 days when Kibana 4 didn't have a server to offload this stuff to, so we do it in the browser. There's 3 things that would help this:

Move index parsing to the server. The work on the ingest API is a good first step.
Don't cache the mappings forever. Retrieving these from Elasticsearch is cheap, we can do it regularly on the server. We could cache in memory for a short period, but we don't need to keep them forever
Get normalized mappings from Elasticsearch: https://github.com/elastic/elasticsearch/issues/15728

The first 2 we can do immediately, the last one would be an amazing optimization that would make everyone's life a lot easier and remove a lot of load and code from the Kibana backend.

Index Patterns REST AppServices enhancement high hanging fruit

Source

rashidkpc

👍56

Most helpful comment

No longer needed as field list is no longer cached - https://github.com/elastic/kibana/pull/82223 - will be released in 7.11

mattkime on 2 Dec 2020

❤6 🎉1

All 38 comments

I'll just leave this here: https://github.com/elastic/kibana/pull/5575

There's some extra cruft in there, but that PR already has most of 1 and 2.

Bargs on 11 Mar 2016

rfarley3 on 12 Apr 2016

Pre-baking Kibana instances becomes ugly with these caches. Being able to do it on the fly, even as an option for smaller instances, would be fantastic. +1

Evertras on 29 Jun 2016

👍3

hannayurkevich on 30 Aug 2016

It would make life a lot easier for Prelert if Kibana just used mappings direct from Elasticsearch rather than having its own mappings.

droberts195 on 10 Oct 2016

sophiec20 on 10 Oct 2016

+1
Do you have any ideas, when something like Solution 2 could land in the final product?

tbuching on 5 Jul 2017

+1
Refreshing mapping automatically or expose some API to refresh index patterns would be nice too.

JulienCarnec on 29 May 2018

+1
Do you have an ETA for a solution like an automatical refresh or API endpoint ?

fakenine on 6 Jun 2018

👍1

+1 API for refresh

Hariharan-Gandhi on 6 Jun 2018

tarraschk on 6 Jun 2018

Please, someone can give us some help ?

fwininger on 6 Jun 2018

+1 for API Refresh - also preserving the "popularity"

Or current approach is going to be to directly call

GET _plugin/kibana/api/index_patterns/_fields_for_wildcard?pattern=...

PUT _plugin/kibana/api/saved_objects/index-pattern/

in imitation of the network requests that we see in the refresh icon

AustinBGibbons on 7 Jun 2018

👍3

Please do this as soon as possible!!!!

Ever since this change was added it has made using ELK for log management the worst solution available. In a medium sized business as ops I am almost daily having to remind developers when they add new fields to their custom logs they have to go click the magic button. It's the most annoying thing ever.

digitalpacman on 25 Sep 2018

Same here, why it's not added yet? We are struggling with a refresh button.

Hronom on 13 Nov 2018

+1
Need this 😊

jerome83136 on 23 Nov 2018

+1
same here! but i am a little bit confused. in v6.5 docs of kibana is already documented, that you can push the reload button to refresh the index pattern..but it doesn't work for me. anyone with "a non-obvious [sic]" way to refresh it?

fabrei on 12 Dec 2018

@fabrei notice that kibana will check ALL the indexes that match for all the fields and their type... so if you have older indexes with a different type, you may get a collision and still block you from using that field. The workaround is either remove the old indexes, reindex then or create a new kibana pattern that will not touch those old indexes (eg: logstash-2018.12.* to exclude previous months indexes).
But i do agree that this is a bad feature that can limit a lot the kibana usage when we need to change something

danielmotaleite on 12 Dec 2018

thanks! i updated the mapping, reindexed and afterwards deleted the old index..

fabrei on 12 Dec 2018

ralphyz on 21 Dec 2018

kinshuk4 on 28 Jan 2019

Alsheh on 5 Mar 2019

Upgrading our users from from Kibana 3 to Kibana 6 and Kibana hangs when it tries to load an index with ~22K fields, even after the mappings have been cached - https://github.com/elastic/kibana/issues/32153
That's a big difference between how the new Kibana handles large indexes. Moving forward on this would be great, thanks.

sgarg7 on 28 Mar 2019

Pinging @elastic/kibana-app-arch

elasticmachine on 29 Apr 2019

Are there any suggested workarounds for this error?

sgarg7 on 19 Jul 2019

+1
Need Index pattern refresh API.

akshayurdh on 17 Oct 2019

AndrewMcQuerry on 5 Nov 2019

+1. Just realized it's been 5 years since the first issue was raised.

AlexanderPingan on 27 Nov 2019

+1 for API Refresh - also preserving the "popularity"

Or current approach is going to be to directly call
GET _plugin/kibana/api/index_patterns/_fields_for_wildcard?pattern=...
PUT _plugin/kibana/api/saved_objects/index-pattern/
in imitation of the network requests that we see in the refresh icon

That is fine if it would work. I wrote this two lines of bash script to do exactly the same requests as the browser sends to kibana backend (replace with your specific one).

refresh_payload=$(curl -X GET 'localhost:5601/api/index_patterns/_fields_for_wildcard?pattern=packets*&meta_fields=_source&meta_fields=_id&meta_fields=_type&meta_fields=_index&meta_fields=_score' | jq '.fields[] | . + {count: 0} | . + {scripted: false}' | jq -s '. | tostring | {"attributes": {"title": "packets*", "timeFieldName": "timestamp", "fields": . }}')

curl -X PUT 'localhost:5601/api/saved_objects/index-pattern/<index-id>' -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$refresh_payload"

I added a new field to my index template and updated all existing indices as well. After the curl requests, I get an answer that the index pattern was updated. But the pattern was not updated in my kibana dashboard. I checked if the added field is in the response of the first curl; it is. Also it is not possible to filter by the added field or create a chart. So I think the code for refreshing a pattern makes another request which is not tracked by my developer tool..But after a day passed, the pattern was successfully refreshed and I had access to the added field.

If you take a look at the refresh button in kibana settings (with a developer tool), you see that the button calls the function refreshFields(). I took a look into the code and found that you need an IndexPattern-object. This object has this specific method. In my case it would be nice to call refreshFields() manually from my plugin which I wrote. Actually I am experimenting, how I can initiate an IndexPattern-object. But does anyone already have an idea?

fabrei on 15 Jan 2020

UPDATE: Thanks to @fabrei in https://github.com/idaholab/Malcolm/issues/100, he suggested something to make the script I pasted more robust. I've updated the link and the code here to reflect that (using a _find to get the index ID based on the index pattern name vs. just assuming they're the same):

Here's a python script I wrote to refresh my index pattern fields in my project:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function

import argparse
import json
import requests
import os
import sys

GET_STATUS_API = 'api/status'
GET_INDEX_PATTERN_INFO_URI = 'api/saved_objects/_find'
GET_FIELDS_URI = 'api/index_patterns/_fields_for_wildcard'
PUT_INDEX_PATTERN_URI = 'api/saved_objects/index-pattern'

###################################################################################################
debug = False
PY3 = (sys.version_info.major >= 3)
scriptName = os.path.basename(__file__)
scriptPath = os.path.dirname(os.path.realpath(__file__))
origPath = os.getcwd()

###################################################################################################
if not PY3:
  if hasattr(__builtins__, 'raw_input'): input = raw_input

try:
  FileNotFoundError
except NameError:
  FileNotFoundError = IOError

###################################################################################################
# print to stderr
def eprint(*args, **kwargs):
  print(*args, file=sys.stderr, **kwargs)

###################################################################################################
# convenient boolean argument parsing
def str2bool(v):
  if v.lower() in ('yes', 'true', 't', 'y', '1'):
    return True
  elif v.lower() in ('no', 'false', 'f', 'n', '0'):
    return False
  else:
    raise argparse.ArgumentTypeError('Boolean value expected.')

###################################################################################################
# main
def main():
  global debug

  parser = argparse.ArgumentParser(description=scriptName, add_help=False, usage='{} <arguments>'.format(scriptName))
  parser.add_argument('-v', '--verbose', dest='debug', type=str2bool, nargs='?', const=True, default=False, help="Verbose output")
  parser.add_argument('-i', '--index', dest='index', metavar='<str>', type=str, default='sessions2-*', help='Index Pattern Name')
  parser.add_argument('-k', '--kibana', dest='url', metavar='<protocol://host:port>', type=str, default='http://localhost:5601/kibana', help='Kibana URL')
  parser.add_argument('-n', '--dry-run', dest='dryrun', type=str2bool, nargs='?', const=True, default=False, help="Dry run (no PUT)")
  try:
    parser.error = parser.exit
    args = parser.parse_args()
  except SystemExit:
    parser.print_help()
    exit(2)

  debug = args.debug
  if debug:
    eprint(os.path.join(scriptPath, scriptName))
    eprint("Arguments: {}".format(sys.argv[1:]))
    eprint("Arguments: {}".format(args))
  else:
    sys.tracebacklimit = 0

  # get version number so kibana doesn't think we're doing a XSRF when we do the PUT
  statusInfoResponse = requests.get('{}/{}'.format(args.url, GET_STATUS_API))
  statusInfoResponse.raise_for_status()
  statusInfo = statusInfoResponse.json()
  kibanaVersion = statusInfo['version']['number']
  if debug:
    eprint('Kibana version is {}'.format(kibanaVersion))

  # find the ID of the index name (probably will be the same as the name)
  getIndexInfoResponse = requests.get(
    '{}/{}'.format(args.url, GET_INDEX_PATTERN_INFO_URI),
    params={
      'type': 'index-pattern',
      'fields': 'id',
      'search': f'"{args.index}"'
    }
  )
  getIndexInfoResponse.raise_for_status()
  getIndexInfo = getIndexInfoResponse.json()
  indexId = getIndexInfo['saved_objects'][0]['id'] if (len(getIndexInfo['saved_objects']) > 0) else None
  if debug:
    eprint('Index ID for {} is {}'.format(args.index, indexId))

  if indexId is not None:

    # get the fields list
    getFieldsResponse = requests.get('{}/{}'.format(args.url, GET_FIELDS_URI),
                                     params={ 'pattern': args.index,
                                              'meta_fields': ["_source","_id","_type","_index","_score"] })
    getFieldsResponse.raise_for_status()
    getFieldsList = getFieldsResponse.json()['fields']
    if debug:
      eprint('{} would have {} fields'.format(args.index, len(getFieldsList)))

    # set the index pattern with our complete list of fields
    if not args.dryrun:
      putIndexInfo = {}
      putIndexInfo['attributes'] = {}
      putIndexInfo['attributes']['title'] = args.index
      putIndexInfo['attributes']['fields'] = json.dumps(getFieldsList)

      putResponse = requests.put('{}/{}/{}'.format(args.url, PUT_INDEX_PATTERN_URI, indexId),
                                 headers={ 'Content-Type': 'application/json',
                                           'kbn-xsrf': 'true',
                                           'kbn-version': kibanaVersion, },
                                 data=json.dumps(putIndexInfo))
      putResponse.raise_for_status()

    # if we got this far, it probably worked!
    if args.dryrun:
      print("success (dry run only, no write performed)")
    else:
      print("success")

  else:
    print("failure (could not find Index ID for {})".format(args.index))

if __name__ == '__main__':
  main()

mmguero on 15 Jan 2020

👍5

Thanks for sharing :) I don't know what I did wrong yesterday, but today my script works as well. I took a look into the management view for index patterns in my dashboard and the number of indexed fields had not been updated yesterday. Maybe the reload did not went totally right yesterday. As I know now which requests I have to do, I will integrate it in my plugin. But if someone knows how the already existing function refreshFields() can be used in a plugin and she/he shares this information, I would feel really happy about it =)

fabrei on 16 Jan 2020

+1
This would be very useful!