Prisma1: Decoupling storage

Created on 7 Oct 2017 · 6Comments · Source: prisma/prisma1

Decoupling storage

This proposal describes the benefits and challenges of decoupling storage from Graphcool projects.

Overview

Currently, there is a 1-on-1 relationship between a Graphcool project and the data storage for that project. Each Graphcool project uses it's own database. However, these two components could also be decoupled, allowing a developer to connect different Graphcool projects to the same underlying data storage. This offers possibilities that possibly overlap with the API Gateway concept, but might be easier to implement and easier to use.

Two projects with (initially) the same GraphQL schema

This scenario would be most useful for creating development and test stages in your project. It would eliminate the need to seed databases or migrate data. When deploying a new schema in development, similar rules would be applied to determine if something is a breaking change as are applied now.

A project with a 'subset' schema

This is a scenario that is supposed to be covered by the API Gateway as well, but this is an alternative approach. You could create a project, with a subset of a schema (leaving fields or entire Types out for example), that would be the public user-facing endpoint. The full schema project would be the protected admin project.

Transitioning into different storage types

This would also ease the transition into using different databases later on.

Challenges (WIP)

[ ] Solution for determining what is the 'master' schema (for removing fields for example)
[ ] ...

kinfeature aredatabase rf0-needs-spec

Source

kbrandwijk

Most helpful comment

LOL, @kbrandwijk - I suppose I never worked in such a strict environment. But I see what you mean ;)

vincenzo on 24 Nov 2017

😄2

All 6 comments

@kbrandwijk I like the use cases here, especially the idea of a sub-service with a partial schema just for public consumers. This is very neat and I think I would definitely use something like that, especially when I think of the project where I am trying to make the team adopt Graphcool, which will have a private consumer and a public one.

However, I guess what I was talking about on Slack is slightly different.

The multi-stage development/deployment workflow I am used to is the kind implemented by services such as platform.sh, for example.

You start with a single "branch/env", which has your code and your data (this can be a mix of database, asset files, etc.). When you branch off (to create a stage env, a dev env, or whatever), data are cloned from the env you're branching off. After this, you can still bring down the data from the parent env to the children (to refresh your branched env, to update it, or whatever the reason).

This helps with testing, development, and release processes. If I push a new schema up to my dev environment, I want to be able to have a live-like database behind it already ready for my testing. If I messed about with it before, I want to be able to refresh it from the parent (e.g. live) environment, so I am sure my schema changes are going to be deployed against a live-like database.

Do you see this a helpful addition to GC? Or should one rely on a fixture database that will have to be imported to each instance (dev, stage, qa, etc.) of the same service?

vincenzo on 23 Nov 2017

@vincenzo I see what you describe as an essential feature of Graphcool and I believe this working (correctly, easily and reliably) out of the box will be a major value proposition.

One thing to keep in mind is that many organisations has a requirement for data scrubbing when bringing live data into a staging environment.

I believe we also need to support seeding databases used in test and development with a known fixture dataset. It is two different use cases and both should be supported as first class concepts or at least really easy to achieve.

sorenbs on 24 Nov 2017

👍2

Thanks @sorenbs, I appreciate the reply, and I totally agree with it. Data scrubbing can be supported easily in my experience. An elegant way I see for doing that is to have the possibility of having post-sync hooks, where you can fire routines for database sanitisation and data transformation.

vincenzo on 24 Nov 2017

@vincenzo Proper data scrubbing does not happen in a post hook. When data is not allowed to leave the system, it's not allowed to leave the system, and in my experience the enterprise data security team will try to shoot you on sight if you first dump the production database into a different environment, and then (hopefully) scrub it. ;)

kbrandwijk on 24 Nov 2017

LOL, @kbrandwijk - I suppose I never worked in such a strict environment. But I see what you mean ;)

vincenzo on 24 Nov 2017

😄2

This issue has been moved to graphcool/graphcool-framework.