Incubator-superset: [SIP-38] Visualization plugin refactoring

Created on 21 Feb 2020  Â·  10Comments  Â·  Source: apache/incubator-superset

[SIP-38] Visualization plugin refactoring

Motivation

One of the most commonly reoccurring questions in the Superset community, on Slack and elsewhere, is that of how to add a new data visualization. The answer, in short, has been “it’s hard.” While that may be true, the goal of this SIP is to lay out both tactical refactor needs for the current implementation to mature, as well as proposing a handful of roadmap features to make plugin development significantly easier. These changes will make upcoming modifications of existing plugins (see SIP-34) drastically simpler, and steer toward opening an ecosystem of Superset visualization plugins.

Much planning and work has already been done to address the difficulty of adding/editing plugins, including a new query API endpoint, but there are many blocking issues and code migrations remaining to complete this process. Special thanks to @kristw, @williaster, @xtinec, and @conglei for their significant contributions to the frontend and API work thus far. These issues, and proposed solutions for them, are enumerated below. Additional suggestions are welcome.

Proposed Changes

General Goals:

  • As much code and configuration as possible for individual visualization plugins should be moved out of incubator-superset and into the individual plugin’s repos (in a perfect world, a new plugin wouldn’t require touching two repos and opening two PRs).
  • Reduce frustration in working on plugin repos, allowing people to more easily see changes as they make them
**Issue:** Control panel configurations for visualizations are centralized in a difficult to maintain [controls.jsx](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/explore/controls.jsx) file. All controls are located in `incubator-superset` , necessitating writing code in two PRs for two repos. **Proposal:** Control configurations (particularly the ones that are unique to any given plugin) should be migrated into the correct individual [control panel config files](https://github.com/apache/incubator-superset/tree/master/superset-frontend/src/explore/controlPanels) . An example of this can be found in [This PR](https://github.com/apache/incubator-superset/pull/8222/files). These individual configs should then be migrated to the individual plugins, and references removed from [setupPlugins.ts](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/setup/setupPlugins.ts).
**Issue:** Plugins (particularly when using the legacy api (`/explore_json`) require an entry in [viz.py](https://github.com/apache/incubator-superset/blob/master/superset/viz.py). In addition to requiring code changes to two repos, the logic in [viz.py](https://github.com/apache/incubator-superset/blob/master/superset/viz.py) has proven to be fragile and cumbersome to maintain. **Proposal:** Use of `viz.py` should be deprecated in favor of the viz-agnostic `api/v1/query` endpoint. In an effort to decouple this, `viz.py` logic (data transformations) should be broken out into individual modules and/or reusable methods, which should be invoked by the new endpoint. This will additionally require that controls should be consolidated wherever possible, e.g. use a single control for `metric`, `metrics`, `metric_2`, `secondary_metric`, etc.
**Issue:** New and existing plugins cannot yet fully utilize the new `api/v1/query`endpoint due to the following issues: - Superset does not yet respect a plugin’s `useLegacy` flag to call the correct endpoint when required - The API has no means to accept data transformation options needed for post-processing (e.g. Pandas) to reach feature parity with the legacy API. - The API does not have unit tests - The API is not documented **Proposal:** - Modify [exploreUtils.js](https://github.com/conglei/incubator-superset/blob/5ea282e45fc5d7a8f7fbe93641d6caf2825abd25/superset/assets/src/explore/exploreUtils.js) such that the `getURIDirectory` method calls the right endpoint depending on the `useLegacy` flag - Add configuration options to the API call to invoke backend post-processing operations, returning transformed data - Write unit tests and documentation - Deprecating the `explore_json` endpoint
**Issue:** Each plugin must be registered manually in `incubator-superset`’s [MainPreset.js](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/visualizations/presets/MainPreset.js) file. Additionally, customizing the plugins loaded for a deployment (i.e. disabling some) is done via [setupPluginsExtra.js](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/setup/setupPluginsExtra.js), meaning the plugins are still loaded as dependencies. And this method only supports plugins removal, but does not let you add new plugins that are not listed in `package.json` from `master`. **Proposal:** Attempting to load plugins via [ES2020’s dynamic imports](https://github.com/tc39/proposal-dynamic-import). The exact implementation of this is a bit TBD, but the idea would be to move the responsibility for registering/loading plugins away from [MainPreset.js](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/visualizations/presets/MainPreset.js). Instead, the plugin paths/packages (and their associated `keys`) could be bootstrapped as an overridable configuration file, and Superset could lazy-load the plugins accordingly. (note: dynamic imports are not [supported natively](https://caniuse.com/#feat=es6-module-dynamic-import) by IE, but Babel provides [potential recourse](https://babeljs.io/docs/en/babel-plugin-syntax-dynamic-import) for that).
**Issue:** Development work on plugins requires manually running a `npm link` operation to load the local plugin, and thus see updates/edits in Superset - this is troublesome in that it is both fragile, and difficult for many developers to discover, as it’s not a common pattern). **Proposal:** Automate the process! Create a “plugin dev mode” NPM script that automatically links (or unlinks) viz plugin packages. See a working example of this concept in [this PR](https://github.com/apache/incubator-superset/pull/8638). This would involve refactoring the `NVD3` plugins to not rely on `/lib` path, `preset-chart-xy` to not rely on `/esm` path - all plugins should follow the same build and source directory pattern.

Additional (follow-up) refactoring tasks

  • Follow CSS-in-JS patterns (see SIP-37) in viz components, sharing common theme styles/variables with incubator-superset. Theme variables may need to be moved to superset-ui to be consumed by both superset-ui-plugins and incubator-superset.
  • Audit and address issues with, and completeness of, i18n of plugin text.
  • Converting all viz components to TypeScript (see SIP-36)

New or Changed Public Interfaces

The query endpoint at /api/v1/query needs significant enhancement, as laid out in the proposals above (post-processing options, tests, docs).

New dependencies

N/A

Migration Plan and Compatibility

N/A

Rejected Alternatives

  • Reintroducing viz plugins into incubator-superset
    Having the plugins be in their own repos is troublesome from a workflow perspective (due to the multiple PRs required, NPM Link work needed, and separate build processes required). The proposals laid out above seek to minimize this difficulty. While it is certainly possible (and indeed likely easier) to move the plugins back into Superset itself (like Redash and Metabase do), solving these more difficult problems seems more likely to open the door to a true plugin ecosystem for Superset.
  • Moving data transformations to plugins (JS), deprecating Pandas
    The idea has been floated that perhaps data transformation (at least in some cases) might be more the responsibility of the viz plugin itself than the backend, and maybe if we moved that logic, we could deprecate Pandas. To test the theory, some basic benchmarking attempts were made on large rollup and pivot tasks, to compare the performance of Pandas against Zebras, Datalib, Ramda, and Lodash. This approach, at least as a global migration, was decided against for these reasons:

    • Sending an entire dataset over the wire, if the frontend just needs a rollup, is a waste of resources

    • If post-processing is done on the backend, the result can be cached for use by multiple charts (or multiple clients and reloads)

    • Neither Zebras nor Datalib provides an out-of-the-box pivot function on par with Pandas, and the pivotWith "recipe" from the Ramda cookbook looked to be significantly slower than Pandas (approx 10x).

    • All these libraries provide grouping, sorting, map/reduce functionality, so you can pivot the data manually. But then, so does Lodash, which matched (or slightly beat) the other JS libraries' performance. This was still about 2x slower than Pandas.

    • TL;DR: If you want to avoid writing Python for a new viz or calling it through the new API, and want to do a little data munging on the frontend, just use lodash or vanilla JS for best results.

#SIP preset-io

All 10 comments

Looks like a solid plan, thanks for writing this up. A few questions (for you or the community):

  • What was the original rationale in moving plugins out into their own repo(s)?
  • How are you thinking about exposing Pandas data transformation functionality to the plugins via the /api/v1/query endpoint. Will there be a whitelisted set of transformation methods that can be called? Will it support chaining these transformations?
  • Should we rename the /api/v1/query endpoint to something less ambiguous?
  • For i18n, I realize this is bigger issue than just visualizations, but have you looked into libraries like globalize or other CLDR-based solutions?
  • Will migration to the new plugin architecture simply be handled by the useLegacy frontend flag? What should the deprecation plan for the old architecture/endpoints be?

What was the original rationale in moving plugins out into their own repo(s)?

  • The collection of visualizations in Superset had grown organically over
    time, with varying quality and lack of ownership for some charts. All
    charts were shipped with every release of Superset. There was no easy way
    to get rid of them.
  • The chart code with no owner (after the PR was merged and the chart
    author abandoned them) became dead code that nobody owns in the main repo,
    and the list kept growing. Maintaining the "visualizations" folder at that
    time was a monumental tasks.
  • Developers have custom needs and keep asking to add more types of charts.
    Some are very specific and debatable whether that is useful for anyone
    else.
  • Very difficult to test the charts.

Breaking them into independent packages and lighter weight repositories allow:

  • more stable incubator-superset and less code to maintain. If a chart
    will only be used by single organization, then the entire community should
    not have to maintain it.
  • chances to deprecate uncommon charts by dropping the plugin from the main
    bundle, but still make them available as packages for developers to install
    them into their own deployment.
  • enforced decoupling. remove hacks for special handlings of certain charts.
  • less arguments about adding certain charts or not.
  • visual inspection of every PR with storybook and easier maintenance.
  • developer productivity: reduced ci build time, smaller codebase, able to
    apply more strict linting rules.

One key part of the embeddable project is to move towards chart plugins system, which we can register only necessary charts for superset or register custom ones as wish. This will give more flexibility to the developers to customize their superset instances (making it more lightweight, include-only-needed, or include custom vis type) as well as improve maintainability of the superset codebase (instead of hosting every visualization in the main repo). In order to do that, we aim to split the chart types (i.e. most of the things in src/visualizations) into one or more plugins (npm packages), independent from superset. Then, we will implement a registry mechanism for importing plugins.

from https://github.com/apache/incubator-superset/issues/5680

Thanks for the writeup, @rusackas! This raises a lot of great points and cleared many of my doubts.

I’m still new to Superset, so apologies if I missed some historical context. Just to be clear, what I meant in #8638 was to temporarily move plugins and their registries back to incubator-superset until things are more settled——plugin API matured, core plugin quality under control, and the big UI overhaul (SIP-34) finally starting to take shape.

My biggest concern was it might take a lot of efforts to get there and having to sync code between multiple repos slows the whole process down. Even if we can solve the linking issue with a custom npm command, people still need to make (and review) two (maybe three) PRs whenever the registry or control panel API needs to change. Overall, the overhead the 3-repo structure imposes _right now_ seemed overweight the potential benefits it may bring in the future—considering there might be other solutions to achieve the same goal.

In general, I agree it makes a lot of sense to have optional visualizations as separate packages and in their own repo. Future Superset admins may even install a plugin from the web UI so it's only natural to do this. But for core visualizations, I had doubts. Superset as a software needs to ship with some visualizations anyway and having the source code in one place makes it easier for developers to start writing their own plugins or upgrading an existing one——simpler dev env setup, easier navigation, better git history... many of which not easily replaceable by npm scripts. Most OSS with a plugin system took this approach, in additional to Redash and Metabase mentioned before, think WordPress, Gatsby, and Strapi.


@kristw , to mitigate some of the original concerns, maybe we can take following actions (if we haven't):

  • Officially freeze the visualization folder, no PRs will be accepted or reviewed until we clean things up and plugin API stabilized
  • Require all new visualizations to have unit test that do not depend on Superset backend API
  • Place additional more strict .eslintrc to the folders of migrated components.

Alternatively, we can also have a frontend "monorepo" within incubator-superset, publishing superset-ui and "official" plugins as separate packages but track them under the same git repo with the backend code. This way embeddable charts also get to stay on the roadmap.

@rusackas I have re-read this item again and it is still not quite clear how the proposal (dynamic import) will address the issue.

image

1) how will dynamic import help with adding new plugins that are not listed in package.json from master?

2) not registering all the plugins

bootstrapped as an overridable configuration file, and Superset could lazy-load the plugins accordingly.

  • Generating MainPreset.js from the configuration file could be another solution as well.
  • For clarification, if the bundle size is concerned, Superset is already lazy-loading the plugins via dynamic imports.

@kristw Thanks for your thoughts on this

Generating MainPreset.js from the configuration file could be another solution as well.

This is essentially what I'm trying to take a step toward. My understanding is that to do that, you'd have to load in an array of plugins/paths from the config (or some state driven by a future UI, even), and then you could then dynamically load those packages and/or paths. This is what that other WiP PR is taking a stab at.

For clarification, if the bundle size is concerned, Superset is already lazy-loading the plugins via dynamic imports.

Bundle size isn't the main concern here, so much as developer convenience.

how will dynamic import help with adding new plugins that are not listed in package.json from master?

My thought (though admittedly half baked, since we're not at this stage yet) would be that people could have any number of plugins added to their local/deployed filesystem, and dynamic loading would let them add the point to the module or just the file path of that plugin, once loaded from the aforementioned config/state. That would all be done dynamically in MainPreset.

Thank you for your prompt reply and clarification.

Generating MainPreset.js from the configuration file could be another solution as well.
This is essentially what I'm trying to take a step toward. My understanding is that to do that, you'd have to load in an array of plugins/paths from the config (or some state driven by a future UI, even), and then you could then dynamically load those packages and/or paths. This is what that other WiP PR is taking a stab at.

What do you think if the MainPreset.js generation happen as part of webpack build or some build script, so the app remains unaffected and can treat MainPreset.js as a regular file?

What do you think if the MainPreset.js generation happen as part of webpack build or some build script, so the app remains unaffected and can treat MainPreset.js as a regular file?

I think this is well worth a Spike, and feel like you or @ktmud are probably well qualified to advise on the webpack aspects of it in particular when we get to it.

I love this, specially the part about consolidating the data model in api/v1/query so that we don't have metric vs metrics. Having a common data model for the different plugins will make it much easier to switch visualizations without losing context, which is very powerful feature IMHO.

I think dynamically generating a JS file might not be very practical if we want users (Sueperset admins) to manage plugins in a future UI. I'd imagine all plugins are loaded dynamically by just checking whether a file exists in some folder. There shouldn't be the need to pre-register a chart type. You just load it as you need it.

Lots of work has been done and summarized / documented here:
https://preset.io/blog/2020-07-02-hello-world/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eliab picture eliab  Â·  3Comments

ghost picture ghost  Â·  3Comments

gbrian picture gbrian  Â·  3Comments

josephtyler picture josephtyler  Â·  3Comments

dinhhuydh picture dinhhuydh  Â·  3Comments