Kibana: Saved object migrations to allow pre-migration state collection

Created on 12 Apr 2019  路  10Comments  路  Source: elastic/kibana

Problem:

SavedObject migrations are synchronous functions making it impossible to write migrations that require looking up some state from the old .kibana index being migrated.

Describe a specific use case for the feature:

I want to convert graph saved objects to use index pattern ids instead of names but realized there isn't a method to lookup existing index patterns and the document migration can only be executed synchronously.

Once completed, this will unblock #34989. cc @flash1293

Proposed solution (updated after discussion below):

Allow migrations to provide an asynchronous pre-migration state collection function. The result of this function will be passed as an argument to each of the synchronous migration functions.

Saved Objects Core

Most helpful comment

We should be really careful in enabling something like this. The problem with asynchronous migrations is that Kibana is completely unavailable while migrations run, so if even one plugin performs a slow asynchronous migration that doesn't scale, it could cause significant downtime for Kibana. This could also make it pretty trivial to clobber the underlying ES cluster by slamming it with queries.

_If_ we support something like this, which I'm not convinced we should even bother, then we might want to introduce some sort of pre-flight async migration hook that would execute once per plugin per version and then result would be passed to all of the synchronous migration functions, and then something like graph could query for its workspaces in bulk and then return a map of names to ids that its own migrations could then rely on.

All 10 comments

Pinging @elastic/kibana-operations

We should be really careful in enabling something like this. The problem with asynchronous migrations is that Kibana is completely unavailable while migrations run, so if even one plugin performs a slow asynchronous migration that doesn't scale, it could cause significant downtime for Kibana. This could also make it pretty trivial to clobber the underlying ES cluster by slamming it with queries.

_If_ we support something like this, which I'm not convinced we should even bother, then we might want to introduce some sort of pre-flight async migration hook that would execute once per plugin per version and then result would be passed to all of the synchronous migration functions, and then something like graph could query for its workspaces in bulk and then return a map of names to ids that its own migrations could then rely on.

I absolutely agree with an async setup or preflight request to capture the data to be used by the migrations instead of providing async migrations.

That is a good idea, it will solve the scenario and be more efficient at running. 馃憤

Updated the description based on the discussion.

Discussed the idea with @rudolf and there are a few things to consider.

  • If the async setup function has access to the saved object client, we possibly enter a circular dependency here, because the saved object client is doing migrations on the fly which might require another async setup function and so on - this case would have to be handled somehow.
  • If we avoid that and only expose a way to send a raw ES query in the async setup function, we have to deal with the possibility to get old un-migrated results back.
  • We are also doing migrations if an object is imported which would require us the re-run the async setup function there which might get costly in some scenarios - maybe some caching of the results can help
  • In the scenario of migrations-on-import it's probably necessary for the async setup to also have access to the other objects of the same batch (e.g. if some properties are moved between dashboard and visualization saved objects and the visualization migration has to look at the old dashboard object)

Because we don't currently have any migrations for the config object, can it safely be loaded before migrations are run? This could unblock our work on https://github.com/elastic/kibana/pull/53972

Discussed this with @pgayvallet and @wylieconlon and we came up with three potential solutions to #53972:

  1. Centralize all index-pattern loading to the data plugin and transform any index-pattern when it's loaded. Having all plugins consume index-patterns through the data plugin has the benefit that in the future we could make small index-pattern tweaks without requiring a migration.
  2. Add the ability to register in-memory-only migrations which have access to CoreStart API's and can collect state asynchronously. These migrations will only run when an object is loaded so it lowers the performance impact but not stalling all of kibana during load time. Changes won't be persisted to ES unless the plugin loading the Saved Object decides to save it explicitly.
  3. Add the async collector as suggested in this issue. Performance problems and exceptions in the async collector could delay or prevent all of kibana from starting up.

The Platform team favours (1) since it will improve the overall architecture of Kibana and has the lowest risk. This also buys us time to improve the saved object migration framework to be more resilient https://github.com/elastic/kibana/issues/52202 which we're targeting for 8.0.0

Pinging @elastic/kibana-platform (Team:Platform)

I'm closing this as I'm not aware of any plugins requiring this functionality. Let me know if we should re-open it.

@flash1293 I believe we've decided that the best way to implement the graph migration is to do that in the graph plugin itself during the start lifecycle (that is outside of saved object migrations).

@wylieconlon I'm not sure if we still want to implement #53972 but as far as I'm aware there's nothing preventing us from adopting the proposed solution to "Centralize all index-pattern loading to the data plugin and transform any index-pattern when it's loaded."

Was this page helpful?
0 / 5 - 0 ratings