Keystone: Bulk data exports

Created on 13 Sep 2018  路  11Comments  路  Source: keystonejs/keystone

The ability to dump data from a Keystone, via the Admin UI (probably the current view). At a technical level there are two broad approches:

A) Pull the data into Node and produce a file for download

This gives us full control of the format (Excel file, CSV, JSON, etc) so is nicest for the user. We can pick columns, ordering and reference hydrated (app generated/virtual) values.

However, without some significant work (generating from a worker, batching items) this approach _does not scale well_. The prior art (KS4) uses this approach and i's known to have caused outages on large sites.

B) Leverage DB platform functionality

Alternatively, we may want to leverage functionality of the underlying DB directly. Specifically...

  • mongodump can be given conditions to filter the documents dumped from a collection. Issues: single collection only, not hydrated, format is difficult to consume, dumps all fields.
  • the pgsql copy command can write queries out directly to a CSV. It may also be possible to dump JSON in this way. Less issues than mongodump; copy can query across multiple tables/views, control columns, order, format values, etc. Also it's _super_ fast. Locally I can dump 80k+ records (20Mb) to a CSV in _under 200 ms_.
  • Etc.. for other DB platforms.

These tool have a _much_ higher capacity to scale but at the cost of some flexibility (esp. Mongo). This is also something we'd need to work into the adapter framework.

feature request question / idea verified

Most helpful comment

Adding hooks to the Admin UI opens a lot of doors and is becoming a higher priority

All 11 comments

Current preference is to focus on "A" but architect it as a background process (#307) that batch reads/formats docs into a file. The user who requested the file could then be notified by email.

The approches outlined in "B" could do well as custom mutations (#309) sitting behind custom elements in the admin UI (#308).

Should we think of excluding some sensitive (privacy related) fields during export?
The exclusions could be global or based on role

Yeah, it would run through all the ACL stuff, so users couldn't export data they couldn't read.

The "export value" _should_ probably be something specifically supplied by field type though (and so should form part of the field type interface). There's really no reason anyone should be exporting Password hashes, for example.

Any update on this?

I feel like this one could be a good candidate for a 3rd-party package

for this we would need extension points for adding UI elements in admin-ui`

I feel like this one could be a good candidate for a 3rd-party package

@MadeByMike, Yeah possibly, it really needs DB adapter support though. Keystone's job here is probably to define the API and let DB adapters implement it or not, any third party export tools can pick up from there.

@gautamsi is also right.

Adding hooks to the Admin UI opens a lot of doors and is becoming a higher priority

It looks like you haven't had a response in over 3 months. Sorry about that! We've flagged this issue for special attention. It wil be manually reviewed by maintainers, not automatically closed. If you have any additional information please leave us a comment. It really helps! Thank you for you contributions. :)

@molomby are we clear enough now to say that bulk data exports is something that should be handled at the database layer and handled by a separate tool: https://github.com/keystonejs/keystone/discussions/299

With that and the adminUI extension points as well as custom queries it should be possible for people to make a basic bulk export feature in the AdminUI.

I would suggest that is enough to close this issue as it unlikely this is going to become a core feature for Keystone.

@MadeByMike writes..

are we clear enough now to say that bulk data exports is something that should be handled at the database layer and handled by a separate tool?

This is a bit of a quagmire because there are a few different dimensions we can solve for:

High-level -vs- low-level -- Is this something that should operate regardless of the DB platform used, eg. via GraphQL, getting all the benefits of access control, read hooks, etc. or should this be implemented directly against the DB for efficiency sake.

Scaleable -vs- naive -- The KS4 implementation of this was let you apply some filters in the admin UI then click a button to download all filtered items (as a csv I think). It was quite handy but also very naively built -- it ran as a long running web request and hurled a single (potentially huge) query at the DB. The lack of any hard limits made it quite easy to DOS your own system with a single click. Fundamentally, bulk actions like this should be async and performed by a background thread or worker process; eg. you'd queue an export and the system would email you or something when it was complete. This is obviously a lot more complicated.

Generic -vs- ad hoc -- For a scaleable solve (as above) the solution needs to work closely with the production infrastructure for things like worker processes and mail gateways. But we don't have any documented patterns or guidance on how this stuff should run so building a _decent generic solution_ (either in core _or_ as a 3rd party extension) isn't really possible yet. The only generic solution practically possible right now would be something simple that doesn't scale.

So what do we do?

I think the only viable solution in the short term is a naive, high-level version; like the KS4 one (but maybe with some hard limits enforced). It wouldn't need any of the complex infrastructure support and seems like a very partial solution to me (it's liable to lead devs in the wrong direction). But it _would_ be undeniably useful on small projects.

So to reframe your question @MadeByMike, would we want this ^^ to be part of core? I'm not sure myself. Maybe not?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JedWatson picture JedWatson  路  17Comments

bpavot picture bpavot  路  11Comments

derkweijers picture derkweijers  路  19Comments

gautamsi picture gautamsi  路  14Comments

jossmac picture jossmac  路  10Comments