Prisma1: Database migrations

Created on 13 Nov 2017  Â·  44Comments  Â·  Source: prisma/prisma1

Motivation

So far database migrations have been almost entirely implicit (except for the temporary directives @rename and @migrationValue). Especially when working in multi-stage/multi-branch environments, a deterministic (and imperative) migration solution is necessary.

Goals

  • Preserve "magical" ease of use of declarative migration workflow
  • Make database migrations deterministic and work well in automated environments (CI/CD)
  • Resistant to merge conflicts
  • Each migration step should be applicable within <100ms
  • Enable easy programatic migration scripting

Proposal

The basic idea is that whenever the data model (types.graphql) is changed, the CLI automatically creates a migration file with the missing migration steps when running prisma deploy.

Migration files are YAML based and contain a list of all migration steps and an optional before/after script. The filename starts with a UTC timestamp (format: YYYYMMDDHHMMSS) and optionally contains a short "description" (e.g. YYYYMMDDHHMMSS_create_products.yml).

The missing migration steps can always be calculated locally (without any "knowledge" of a deployed service) which is the delta of the current types.graphql file and all existing migration steps combined. If the data model described by types.graphql and the data model described by all migrations line up, the migrations are "in sync".

Minimal example

Assuming you're starting out with an empty service definition (0 models/types) and add a new User model with an id and name field, this is how your project should look like:

# types.graphql

type User @model {
  id: ID! @unique
  name: String!
}
# migrations/20080906120000.yml

steps:
  - operation: type.create
    definition: 'type User @model`

  - operation: field.create
    type: User
    definition: 'id: ID! @unique'

  - operation: field.create
    type: User
    definition: 'name: String!'

Terminology

  • Migration: A YAML file containing a list of migration steps and optionally before/after scripts.
  • Migration step: Instruction on how to change the data model for a deployed service.
  • Migration script: Script that can be run before and/or after a migration.
  • Migrations are in sync: SDL based data model and all migrations line up.

List of all possible migration steps

steps:

    # Types
  - operation: type.create
    definition: 'type User @model'

  - operation: type.update
    id: User
    definition: 'type NewUser @model implements Human'

  - operation: type.delete
    id: NewUser

  - operation: type.create
    definition: 'type User @model'


    # Fields
  - operation: field.create
    type: User
    definition: 'firstName: String! @defaultValue(value: "John")'

  - operation: field.update
    id: User.firstName
    definition: 'name: String! @defaultValue(value: "John Doe")'

  - operation: field.delete
    id: User.name


    # Relations
  - operation: relation.create
    definition:
    - 'posts: [Post!]! @relation(name: "UserPosts")'
    - 'user: User! @relation(name: "UserPosts")'

  - operation: relation.update
    id: UserPosts
    definition:
    - 'posts: [Post!]! @relation(name: "UsersOnPosts")'
    - 'users: [User!]! @relation(name: "UsersOnPosts")'

  - operation: relation.delete
    id: UserOnPosts


    # Multi-field unique constraints
  - operation: field.create
    type: User
    definition: 'firstName: String! @unique(group: "UserName")'

  - operation: field.create
    type: User
    definition: 'lastName: String! @unique(group: "UserName")'


    # Interfaces
  - operation: interface.create
    definition: |
      interface Human {
        name: String!
      }

  - operation: interface.update
    id: Human
    definition: |
      interface DoubleHuman {
        firstName: String!
        lastName: String!
      }

  - operation: interface.delete
    id: DoubleHuman


    # Enums
  - operation: enum.create
    definition: |
      enum RGBColor {
        Blue
        Green
        Red
      }

  - operation: enum.update
    id: RGBColor
    definition: |
      enum Color {
        Blue
        Green
        Red
        Yellow
      }

  - operation: enum.delete
    id: Color

  - operation: api.lock

  - operation: api.unlock

Migration scripts

You can run a script before and/or after a migration. This allows you to migrate not only the structure of your data but your data itself (e.g. iterate over all User nodes and calculate a value for a newly created field).

A migration script can be written in any language and is simply invoked through a bash command. When a migration script fails (i.e. returns non-zero exit code) the migration will be rolled back. Migration scripts are executed locally (or in your CI environment) - not on the Prisma server.

# migrations/20080906120000.yml

scripts:
  before: node before-migration.js
  # after: node after-migration.js

steps:
# ...

Ambiguous cases

Some data model changes result in ambiguous migration steps where the developer needs to choose the desired behaviour from an interactive CLI prompt.

The following cases are ambiguous whether an element should be renamed or deleted+recreated:

  1. Renaming fields/types
  2. Renaming enum values
  3. Renaming relation names/relation fields
  4. Renaming interfaces

Here is an example for changing the type and according fields:

type Person {
  name: String
  description: String
}

# ...migrate to...

type User {
  firstName: String
  lastName: String
  comment: String
  age: Float
}

type Member {
  name: String
  title: String
  other: String
}

image

image

Deployment workflow

_WIP_

Implementation should incorporate changes described in Improved infrastructure for service deployment.

This is what the deployment API could look like:

type Schema {
  mutation: Mutation
}

input MigrationStep {
  operation: String!
  definition: String
  id: String
  type: String
}

input Migration {
  id: String! # UTC timestamp
  hash: String! # based on id + file content
  steps: [MigrationStep!]!
}

type MigratePayload {
  success: Boolean!
  messages: [String!]!
}

type Mutation {
  migrate(migrations: [Migration!]!): MigratePayload!
}

Design decisions

  • Remove @migrationValue directive/functionality given that it wasn't really a good solution for most cases and rather caused confusion for developers.

Open questions

  • [ ] How does the term/concept "changeset" tie into migrations?
  • [ ] Always create a new migration file or prompt developer to choose between new migration file & last migration file?
  • [ ] Should deploy command prompt developer to enter migration reason (used in migration file name)?
  • [ ] Possibly split up gc deploy command?
  • [ ] How to change enum values?
  • [ ] How to change relation names?
  • [ ] How does database seeding and imports fit into migrations?
  • [ ] How does rollback work?
  • [ ] Expose API key via env var for migration scripts?
  • [ ] Migration steps to migrate between relations and embedded types (see #1160)
  • [ ] Do we need to distinguish between migrating existing db vs creating new? (db:migrate vs db:schema:load in rails)
  • [ ] How to fit database indexes into migrations?
  • [ ] What is the process for writing migration scripts? How is auto-generated with access to the service?

    • [ ] Is service endpoint and credentials injected into scripts as env vars?

references

kinfeature arecli aremigrations rf1-draft

Most helpful comment

Without advanced migrations for migrating up AND down (inkl hooks to invoke data migrations scripts) prisma can NOT be considered production ready for any team using CI/DI. Basically this means any larger customer especially enterprise will find the lack of this a show stopper.

I don't really care about the format, but it has to be on disk, editable incl being able to make any changes to the data we need.

What is the current status and how can we help (slack channel seems to be dead)?

All 44 comments

Ambiguous deltas

Update: this is now accounted for in the proposal 🎉

old comment

The missing migration steps can always be calculated locally (without any "knowledge" of a deployed service) which is the delta of the current types.graphql file and all existing migration steps combined.

Deltas are sometimes ambiguous - how can the developer distinguish between different 'migration paths' in such a case? Example:

Base Version

type User @model {
  id: ID! @isUnique
  title: String!
}

Latest version

# latest version
type User @model {
  id: ID! @isUnique
  name: String!
}

Possible migrations

Solely based on the two types.graphql, at least two migration paths are matching.

One is to rename title to name:

  - operation: field.update
    id: User.title
    definition: 'name: String!'

another is to remove title and add name:

  - operation: field.delete
    id: User.title

  - operation: field.create
    type: User
    definition: 'name: String!'

Note that applying the former migration when the latter migration was intended leads to undesired data loss.

Thanks, some thoughts from my side:

Migration Scripts:

  • Not having any inputs on the migration scripts will cause frustration and will be a well of creativity for spectacular hacks. On one hand I can see why it is tempting to not have input for the ease of implementation and reduction of complexity on our end, however, on the other hand this will require developers to either write a script for _every migration_ that needs a script, or come up with said hacks. I can imagine that there are a lot of use cases out there that require at least basic knowledge of the migration to implement some sort of reusable migration scripts or custom system-specific migration programs/services that interface with our migration actions and instrument our workflow. My point is basically that we should put more thought into the ease of instrumentation through external code - you somewhat mention it, but not having inputs hamstrings this effort. A basic solution is to pipe in JSON to stdin or something equally simple.
  • Where are scripts run? It is not clear in this spec what the workflow for scripts actually looks like. Do I push the scripts with the deploy / migrate command, and the server executes them? Do I do that locally? The former is a good idea, the latter problematic, to say the least. In my opinion, as Graphcool sits above the database, it is perfectly reasonable to only allow migration scripts to run on the server itself that acts as a controlling entity to serialize migration attempts and prevent mutiple developers from doing something unintentional and wreaking havoc in their database. This still allows for a simple approach locally and a more controlled one on live systems.
  • Thinking more about it, the scripts need to interface with Graphcool, because I assume that data migrations are done via the API. Should the scripts figure out themselves where to go against? Here it is obvious to me that some kind of input for the API/database "connection" is required. Having developers juggle their URLs in the code sounds like a recipe for frustration and ties into my thoughts above for having input on scripts.
  • Migration scripts should not be able to trigger another, nested deploy through the API.

Migration .yml files:
This is more like thinking out loud how things must work and tries to fill the gaps of the spec (as you didn't talk about the client-server interaction yet).

Okay so files are apparently ordered by timestamp. One goal for migrations is Resistant to merge conflicts. I assume the merge conflicts in question are git merge conflicts. I also assume that migration files are checked into version control, as the other case would be too complicated for multiple devs to handle with wiring up all the correct script files and whatnot. There will simply be next to no merge conflicts if two developers happily develop on different branches and then merge. Why? Git will just take all those separate files and throw them into the migration folder, as there are no (or rarely) name conflicts due to timestamps naming. The only potential for merge conflicts here is of course the types.graphql file, which is certainly an indicator for developers that they need to take a hard look at their migrations as well, but even this is not a given.

Now imagine a scenario where two branches from two devs diverged from master. One branch is shortliving and one is a longer, larger feature. The shortliving branch gets merged and the migrations are run. The larger branch is merged and now master has interleaving migrations of run and not run migrations:

  • A.yml <- applied, baseline
  • B.yml <- applied, baseline
  • C.yml <- not run, from large branch
  • D.yml <- applied, from small branch
  • E.yml <- not run, from large branch

I assume state keeping is done on the server which knows which migrations have been executed already (e.g. a traditional migrations table), or else this scenario can't work, as there is no way to detect an issue with C. It also can't be computed locally (ie. as there is no state available), so the CLI is out here and has to rely on errors from the server during migrate or deploy, which is a good thing. Will the server just error out immediately as soon as it finds a migration that is not run and before an already applied one, or will it try to resolve the conflict (more complex, more sophisticated, higher critical bug potential)? What is the flow if there is such an error? Does the developer create a new migration and delete the old one, or redate the conflicting migrations to a newer date? The tooling and the (error) messages certainly have to be on point to reduce friction as much as possible. Most of these things I mentioned are more or less solved by rails migrations already and should give us a good starting point of what developers expect.

One random thought: In rails it was common to delete or squash all migrations at some point into one base migration if the migration folder grew too large. The schema.rb file (that was more or less used as the base migration) was _sort of_ our types.graphql. We should definitely consider this scenario, as serious projects will amass migrations fairly quickly.

One last thing here: @marktani has a good point with the migration ambiguity. I wonder how this is complicated by the flow I described above with parallel branches / parallel changes to the types, I have to think more about these edge cases.

"Apologies for being picky" points:

  • Having a Each migration step should be applicable within <100ms design goal is all good until you hit data column migrations that can occupy the database for multiple minutes and more. I don't see why this is relevant or on the list in the first place if there is at least one obvious case where operation time is unbounded to such an extent that making guarantees is futile.
  • I don't think that the migration spec should sometimes use strings and sometimes list of strings for the same field (e.g. definition) - it should adhere to the same type, even if it is just YAML.
  • What are the transactional guarantees here? You mention rollback in the open questions, so... no transactional guarantees?

@dom

Some comments :-D

Input to migration scripts

In Rails each migration script is unrestricted ruby code - it could send an email if it wanted to. I assume this is what you mean by input?
What are the primary use cases for this - I assume loading or manipulating existing data?

Script execution

The current idea is that the cli will invoke the scripts.
During development, a single developer is assumed to work exclusively on a branch. We recommend spinning up a database for each feature branch.
On shared databases - typically staging and production, the cli should be executed from a CI environment that can control exclusive access to running migrations. I see a future where Graphcool provides the CI environment for Graphcool database migrations, but for now people are encouraged to use their existing CI setup.
Additionally I think that Graphcool should prevent multiple migrations from running at the same time. When this involves arbitrary scripts before and after the actual migration we need a way for the cli to inform the server about a migration beginning or ending. We can either make this implicit (cli sends messages to server) or explicit (the scripts will have to run mutations to mark beginning and end).

Script - connection to Graphcool

I agree that this has to be tied to input from the cli. Maybe we should have a concept of a special connection for migrations

Nested deploys

I agree that only one deploy should be allowed at a time.

Learnings from Rails

Migrations from two branches with interleaving timestamps Rails simply run all migrations in order, skipping any that has already been executed. We should investigate if this has any severe downsides in our context. Otherwise, that would probably be the solution that is easiest to understand.

Merge Conflicts Rails include the timestamp of the latest migration in db/schema.rb in order to ensure that the developer is made aware that another branch has introduced changes and should verify that there are no issues.

Relationship between migrations and db/schema.rb In Rails, individual migrations are created manually (either by writing code or giving input to the cli.) After running a migration, Rails automatically inspects the resulting database and recreates the db/schema.rb file. This file is auto-generated, and manual changes will not be preserved. So even though migrations are the source of the current state of the database, db/schema.rb is the authoritative source for the database schema. This is a bit weird, but comes out of a practical realisation:

There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema.

In Graphcool this works differently. Our equivalent of db/schema.rb is types.graphql which developers are expected to modify directly. This file is never automatically generated. Instead migrations are automatically generated by diffing against the current state.

Also worth noting that db/schema.rb is database independent (works with mysql, postgress etc), but if engine specific features are used, rails support switching to a sql based format instead.

Transactional guarantees I don't think we can rely on database transactions for migrations. For one, this isn't supported by MySql, and secondly there is no good way to ensure that we use a single connection for both the migration steps as well as possible migration scripts. Instead we should add primitives to temporarily disable writes and potentially reads.

Developer actions

A list of actions a developer will perform when working with Graphcool. Please point out any actions not on this list yet so we can make it exhaustive

  • Init empty database
  • Create migration from diff
  • Recreate migration from diff
  • Rollback and recreate migration from diff
  • Apply migration
  • integrate migrations from master
  • Init new database from existing types.graphql
  • Get types.graphql from existing database

Init empty database

This happens when you use Graphcool the first time or when you start working on a new project

  1. gc init: Create types.graphql and first migration file to init empty database
  2. gc deploy: run first migration (or whatever we do to create empty db)

Create migration from diff

This happens on a daily basis as you make changes to your db structure

  1. Make changes in types.graphql
  2. gc deploy: generate current database schema; diff against new database schema from types.graphql; iff they are different, create migration file
  3. gc deploy: run all pending migrations against server

Recreate migration from diff

When you realise that your latest migration wasn't quite right before you actually perform the migration.
Don't do this if migration has been executed against a shared database (ie. staging or prod) or the branch has been merged

  1. Make corrective changes in types.graphql
  2. gc deploy --recreate: generate current database schema, excluding last migration; diff against new database schema generated from types.graphql; iff they are different, overwrite last migration file
  3. gc deploy: run all pending migrations against server

Rollback and recreate migration from diff

When you realise that your latest migration wasn't quite right and you have already performed migration against your local database.
Don't do this if migration has been executed against a shared database (ie. staging or prod) or the branch has been merged

  1. Make corrective changes in types.graphql
  2. gc deploy --recreate: generate current database schema excluding last migration; diff against new database schema generated from types.graphql; iff they are different, overwrite last migration file
  3. gc deploy --redo: roll back latest migration and then run all pending migrations against server

Run migration

When you integrate changes from the master branch that includes new migrations, or when running migrations against a shared database (typically staging and prod). Migrations have already been created and tested by developers in the feature branch where they were introduced.

  1. gc deploy run all pending migrations against server

Init new database from existing types.graphql

This happens whenever you need to spin up a new test database and when you start working on a new feature branch as well as when a new developer joins the project.

  1. gc deploy Should we replay all old migrations or simply create a new giant migration that goes directly to finished schema? Rails does the latter for performance and robustness and I think we should do the same.

Get types.graphql from existing database

If you have lost access to your migration files and types.graphql you need a way to get it back from the server

  1. gc pull-types Server generates types from the existing database structure. cli stores in local file

Note

above gc deploy always performs the following check to decide if it should a) generate a new migration file or b) perform migrations on the database: generate current database schema by iterating over all migration files; diff against new database schema generated from types.graphql; are there any changes?

Awesome, if this can help a bit I have a script that manage the migrations scripts. Like rails there is different migration files prefixed the date and then all migrations are stored in the database as a new type (that can be included in the system api) and from that just apply every migrations that are not yet in the database. If this can help https://gist.github.com/antho1404/576642a12f07f09f853a42ecf67a45f3

Also in rails few times I had to do a migration of the structure then run a script and migrate again the structure (when add a unique not null field) for exemple. Of course it's still possible to generate 3 different migrations (create the field, migrate the data, mark as unique and not null) but I think it could be nice to have this in a single file and so allow to add scripts between operations and not only before or after the migration

I also think that the types.graphql should only be generated, if a project have migrations it will be really messy to have migrations and types.graphql file, which one will be applied if both are modified with different data ? Maybe have a script that can generate all the initial migrations based on the file and then ignore it

Thanks @antho1404! This is very helpful.

One thing I want to clear up right away: The types.graphql file is never generated. It is and will continue to be the primary way you change your Graphcool database schema. If, for example you want to add a field to an existing model, the workflow looks like this:

  1. Add the field to the model in your types.graphql file
  2. Generate the migration (this happens automatically by diffing new types.graphql with previous state)
  3. Run migration

The generated migrations should not be changed manually as this will result in your types.graphql being behind causing the cli to create a new correcting migration that will revert your manual change to the migration file.

Does that make sense?

We should consider if migrations is the right way for us to do schema change deployments.

Redgate and Microsoft SQL Server Sata Tools use a state based diffing approach compared to rails style migrations. This is described in Critiquing two different approaches to delivering databases: Migrations vs state

This might be a more natural fit to Graphcool

Hum my bad I didn't read carefully

whenever the data model (types.graphql) is changed, the CLI automatically creates a migration file

I feel it's kind of weird to do that way. Maybe because I worked too much with rails but for me I feel if I want to update something on my database (types, data, relations, whatever...), I don't want to have to go in the big file with everything and loose time to know where to put my modification and maybe it will be at the beginning and the end of the file, but just want to write one file with only the updates I want to apply exactly like the migration files you describe and then just run a command to dump the new migration into the types.graphql file and then deploy the modification.

I guess maybe the architecture doesn't suit this idea and maybe I'm just really biased with rails migrations. Maybe I just have to try to think differently :)

It's really interesting to learn from developers with different experiences. To me, the fact that I can have one coherent view of my data model in the types.graphql file is the magic behind the Graphcool migration system.

Having to manually write migration files would be a big step backwards for me. I'd be very curious to hear from other people how they feel about this :-)

I think the stuff that "bothers" me a bit is to call it migrations maybe call it "diff" or something like that will be less confusing for me or also another idea if the idea is really to have the state of the database and just update the state maybe it can be an idea to have all the full different states (so different versions of the types.graphql) The stuff confusing is to have some migrations and some states I feel less confusing to have only one. Again this are just some thoughts

This sounds really good. I especially like that it's all static and doesn't introspect the DB.

Some thoughts:

  • Presumably it would be possible to edit the generated migrations provided the end result of running all the migrations matches types.graphql? For example to change one of the "Ambiguous change" decisions made in the CLI.
  • Would the before/after scripts be stored in the migration file or referenced or auto loaded based on a naming convention?
  • In development you could end up with a whole load of small migrations while you tinker with the schema. In this case could you just delete all the uncommitted migrations and have the CLI regenerate a single migration?

@tamlyn

Presumably it would be possible to edit the generated migrations provided the end result of running all the migrations matches types.graphql? For example to change one of the "Ambiguous change" decisions made in the CLI.

Yes

Would the before/after scripts be stored in the migration file or referenced or auto loaded based on a naming convention?

Migration scripts has to be explicitly referenced from within the migration file. Migration scripts will be invoked by the CLI

In development you could end up with a whole load of small migrations while you tinker with the schema. In this case could you just delete all the uncommitted migrations and have the CLI regenerate a single migration?

This is a great point. As long as no migration scripts are involved, this will be possible

Migrations

Sorry for chiming in so late. I thought a bit about the CLI centered approach for migrations and i don’t think it’s the right approach. I agree that migrations should be stored in the Git repo of a Graphcool project. However i don’t think that this means that migrations need to be generated by the CLI.

Disadvantages of the CLI centered approach:

  • slow: At some point the CLI has to read hundreds of migration files to calculate the current state based on the migrations to see if migrations and SDL are in sync. Even if we introduce some squash logic for migrations it would be one file with hundreds of migrations.
  • compatibility: the CLI and Server become a lot more coupled. It will be very hard to define a stable interface since both would control one part of the overall migration process. Only certain combinations of CLI and server will work.
  • evolvability: We will probably introduce changes to the migrations at some point. The CLI would have to support all legacy versions.

Ideas for a server centered approach:

The server stores migrations in an immutable log. The server is responsible for generating migration proposals which can then be edited by the developer. Once the editing is done the migrations can be pushed to the server.

The process of deploying would look like this:

  • When running deploy the CLI sends the desired types.graphql to the server. The server than diffs with the current state and sends back a list of migrations that would result in the desired state change (those migrations are a proposal).
  • The CLI saves those migrations to disk and the developer then can edit those to e.g. add his migration scripts.
  • The CLI can then push those migrations to the server. At this time the migrations are appended to the immutable migration log. It is not possible to modify this migration anymore.
  • The deploy background worker then starts to apply those changes to the project.

When deploying to a new env:

  • The CLI contacts the server to learn about the most recent migration it knows about.
  • The CLI simply pushes all the migrations that the server does not know about yet.

General unanswered questions:

  • What does the UX in case of a merge conflict? Imagine Developer A and B start their respective branches from the same state on master. Developer A renames the name field on type person to firstname. Developer B renames it to nickname instead. Whoever merges to master first would win. How does the other developer resolve that conflict? Maybe we generally reject conflicts?

Not directly related to this proposal but more on the implementation side, @cjoudrey shared a couple of thoughts with me. He was suggesting to look into online schema changes similar to these projects to continuously allow read/write access to the database while performing the migrations:

Step Granularity

The proposed steps are fairly coarse. given the following schema:

Initial Schema

type User {
  id: ID! @unique
  name: String
}

If a developer want to make it a unique field they would generate the following migration script:

Migration A:

- operation: field.update
    id: User.name
    definition: 'name: String @unique'

This includes the intended change (User.name should be unique) as well as the current state of the field (called name, of type String, not a list, and optional)

Merge conflicts

If a second branch created from the same initial schema introduces a change to make the field required, that migration will look like this:

Migration B:

- operation: field.update
    id: User.name
    definition: 'name: String!'

Again, the desired change as well as the current state of the field is contained in the migration.

In this situation two branches introduce changes that could in theory be compatible. Before we dive into the possible conflict resolution strategies I want to clarify how Graphcool and Git branches play together:

Graphcool and branches

In a team with multiple developers it is common to have the following environments:

  • local development for each developer
  • staging for integrating dev branches
  • production

local development can use the local docker based version of the GraphQL Database or connect to a hosted development cluster. Crucially, changes to the database schema during development of a feature should be applied to an environment available only to the current developer.

It is suggested that a separate branch is maintained throughout the development of the feature. When the feature is done, the branch is merged to a dedicated staging branch and continuous integration will deploy the database changes to a dedicated staging database.

When tested in staging, migrations can be performed on production.

Conflict resolution

In the scenario above, two developers created a feature branch from the same starting point, and now merge their branches to staging at the same time.

As the migrations are stored in files with unique names, there will be no git-level conflict. So we will have to deal with the conflict at deploy time.

We have 3 possible conflict resolution strategies:

  1. Reject the second deploy
  2. Let the second deploy override the first (last write wins)
  3. For each deploy, apply only the actual change, preventing the second deploy from overriding the first

1. Reject the second deploy

To do this we would need to store extra information about the origin allowing us to keep track of the origin of a deploy. This would be very similar to the tree structure git maintains for branches. Maybe we could find a trick that allows us to implement this in a more simplistic way.

2. Last write wins

This would be very easy to implement. If we go for this I believe we need some other way to make the developer aware that there is a potential issue. Rails deals with this by always forcing a source control conflict when branches simultaneously change the database schema Link.

This is easy to implement, but probably not the developer experience we want to deliver.

3. Perform more granular migrations

In this particular case the conflict could be avoided if we perform only the intended change instead of overriding all settings for the field. This can be implemented in two ways:

  • Make the migration steps more granular:
- operation: field.update
    id: User.name
    required: true
  • Include the previous database schema together with the migration steps when deploying.

We should consider adding a migration status command: https://orator-orm.com/docs/0.9/migrations.html#getting-migrations-status

Overview of possible Conflicts

With this comment i would like to work towards a common understanding of conflicts that can occur with regards to migrations. When we discuss actual solutions we can check how a given conflict would be handled within that solution.

Parts of a Migration

It is important to note that migrations consist of 2 parts based on our current understanding:

  1. The change that is applied to the schema of a project.
  2. The script that gets executed before and/or after the changes to the schema are made.

A conflict might affect only one or both parts of a migration. The affected part is mentioned in each of the following scenarios.

Standard Scenario

The following is the standard scenario for the potential conflicts. Developer A and B start a branch from master from the same commit. Both developers perform their changes on their respective branches. Developer A gets to merge first. Developer B merges afterwards and therefore might be exposed to a conflict.

image

1. Renames

When does the conflict occur?
This conflict can occur whenever developer A renames something (model, field, enum..) and developer B does not know that change. If a migration of developer B refers to the old name it will fail when it is applied after the changes of developer A.

Example
Developer A changes the name of the type _User_ to _Customer_. Developer B adds a new field to the _User_ type on their branch. Developer A merges first to master and the change gets applied to the staging environment. Developer B merges afterwards and when their changes get applied the system does not know the type _User_ anymore and therefore fails.

Affected migration part
schema & script

Consequences

  • The schema part of a migration cannot be applied if a step refers to an old name.
  • The before script part of a migration needs to be rewritten if it refers to an old name.

2. Deleting something

When does the conflict occur?
This conflict can occur whenever developer A deletes something (model, field, enum..) and developer B does not know that change. If a migration of developer B refers to the deleted thing it fails.

Examples
Developer A deletes a field and Developer B refers to it (updating or deleting it).

Affected migration part
schema & script

Consequences

  • The schema part of a migration cannot be applied if a step refers to something that does not exist anymore.
  • The script part of a migration needs to be rewritten because it is based on outdated assumptions, e.g. a field does not exist anymore.

3. Name Clashes

When does the conflict occur?
This conflict can occur whenever developer A and B use the same name for something new (model, field, enum).

Examples
Developer A and B create a field with the same name.

Affected migration part
schema

Consequences

  • The schema part of a migration cannot be applied if a step tries to create a field with a name of an existing field.

4. Simultaneous Updates

When does the conflict occur?
This conflict can occur whenever developer A and B update the same thing (model, field, enum).

Examples
Developer A changes the type of a field and Developer B makes that field required.

Affected migration part
schema & script

Consequences

  • Since developer B merges last their change might overwrite the of developer A.
  • A migration script of B is based on outdated assumption and might fail, e.g. A made a field optional and B's script does not handle the null case for that field.

Conclusion

The fundamental problem is that developer B performs their changes based on the knowledge about a certain starting point. That starting point is the commit this developer used for branching.
When developer B merges their pull request however this starting point might have changed drastically. Therefore B's migrations might fail or won't have the intended effect.
What needs to happen here is that the changes of B need to be adapted to the new starting point. In Git speak developer B needs to rebase their migrations. In a fully manual approach that would mean to go through all the migrations they have created and reevaluate them against the new starting point.

There are 2 ways to store migrations

With this comment i want to make us aware that there 2 fundamental ways to store migrations. The goal of a migration is that we want to apply it to a service in a given state and thereby transition it to the next state in a reproducible fashion.
We can express this with the following formula: P(n) + M(n, n+1) = P(n+1). This can be read as:

Given a project in state n: If we apply the Migration steps that capture the transition of the project state n to the state n+1, we end up in state n+1.

The initial project state P(n) is always given. In a migration file we can either store the migration steps M(n, n+1) or the next state of the project P(n+1) in order to capture what needs to be done to migrate a project into the next state. Therefore we have 2 options to store a migration:

  1. Storing the migration steps: The approach that @schickling outlined initially falls into this category. In this case the server would apply the migration steps M(n, n+1) to the current project state P(n) and calculate the next project state P(n+1).
  2. Storing the next project state: In this approach a migration would store the next project state P(n+1). In this case the server would calculate the required migration steps M(n, n+1) and apply them to the project.

Conclusion

There are 2 ways to store a migration in a file. We either store the concrete steps or the next project state.

Open Question: Would it be beneficial to store the migration steps and the next project state?

Pro and Con of the migration approaches

Applying explicit steps

Pro:

  • Explicit handling of what happens, also allowing for more fine-grained control with scripts attached to individual steps.

Con:

  • CLI and server are inevitably coupled via a protocol that is subject to changes, which directly means that the whole thing is not resistant to spec changes (file format, possible actions, etc.). It might lock a developer to a specific version of the CLI / database until everything is migrated to a newer format, or we have to always be backwards compatible. Both options are undesirable.
  • Number of files inflates quickly during development. You can easily end up with a ton of migrations, depending on which file format we end up with, making manual merging / sanity checking migrations error prone and difficult. Just imagine having to check dozens of migrations + attached scripts if you merge even a medium sized feature branch. In my opinion, we need a mechanism or best practice to reduce complexity / squash migrations or this approach is more frustrating than useful.
  • We need to catch and consolidate special sequences of migrations. Imagine a developer quickly prototyping locally, deleting and re-adding fields. As the idea is to reapply these migrations to a different cluster, deleting and re-adding a field would simply erase all data for that column if applied to the production database. Those two actions must cancel each other out and neither get applied. This logic has to be rock-solid or else we'll run into some serious issues, but it is a tough space to reason about, as we need to map all actions to possible counter-actions in a ruleset.
  • In my opinion, high implementation complexity on both the server and CLI side.

Applying the target schema, aka inferring migration steps

Pro:

  • Control is on the server for diffing and applying migrations, making it the single source of truth instead of multiple sources, e.g. dozens of migrations files coming into the server.
  • Highly resistant to changes on how migrations are inferred or applied, as it is transparent to the client. Only the server needs to know how to reach the desired state.
  • In my opinion this approach has significantly less implementation complexity compared to the explicit approach. My gut feeling also tells me that there are less severe pitfalls with this approach.

Con:

  • Changes are implicit, less control for the developer
  • Diff: Rename inference needs to be done explicitly (e.g. directives as of now) in some way, or the diff is non-deterministic.
  • The idea of having scripts run before / after specific actions during the migration is not easily done for everything that is not beforeAll / afterAll.

Is there any news on this?

Coming from a dotnet/EF background I highly value this. Especially for the ability to roll forward and backwards in both Data and schema

Here's an example which I think summarizes a lot of the concerns:

  1. Developer A renames a field and deploys to his local environment
  2. Developer A removes the temporary rename directive, and renames another field.
  3. Developer B pulls all of Developer A's changes and deploys them to her local environment
  4. Developer B adds a few changes of her own (possibly more rename directives)
  5. Developer C deploys Developer A and B's changes to staging.
  6. After testing, and a few bug fixes, Developer C deploys to production.

This is a very common process. There are a few things that would currently break (as I understand it).

  • Step 4 would fail, since Developer B's local environment doesn't have the history of renaming the fields
  • Step 6 would fail, since the staging environment also doesn't have the history of which rename directives were applied where
  • Step 7 would also fail for the same reason. Production's difference with the new code set could be quite great, depending on how often releases are scheduled. In some cases, releases could be scheduled once per week, or even much longer. Depending on how many rename directives the developers used that week, the deployment to production could cause a lot of ambiguous migrations all at once.

To summarize: While a static schema is necessary for maintaining simplicity, and the "magic" of Prisma (of which I'm a huge fan), some history of these changes needs to be recorded in the (version controllable) code base. Then the server needs to be able to accurately interpret that history, and identify where its state fits in to that history, so that it can apply the migration steps necessary to automatically update its state to be current with the code base. The server will also need to be able to roll back in time (manually or automatically) to a previous state, in the case of a failed update.

So far, the discussion seems to be surrounding how the history "log" gets generated, and what generates it, and what format it should take.

Is this a fair summary? Has the conversation progressed at all?

For the record, I'm sitting on the edge of my seat. Some of the stuff you guys are doing is blowing my mind, and I can't wait until it can blow my clients' minds as well.

Hi ,
I want to migrate db2 schema to mysql database. Is there any open source tool or script which can help to migrate schema as well as data.
Please help!

Regards,
Ravi

Hey there, here's my two cents:

(This opinion does not give any solutions regarding conflicts (eg: a migration tries to access to a deleted field), but rather a general view of what a migration file could look like, and a way to give developers flexibility)

As @dpetrick said, we really need to give developers flexibility regarding migrations, and break a little the magic of Prisma for the sake of liberty.

As soon as you run into production, you will inevitably fall into some situations where you have to customize migrations to maintain backward compatibility with old stuff, or to provide default values on targeted nodes based on some very specific needs.

Let's say you're making required a non-required field that has already nodes.
You will have to provide default values to make sure your schema stays in sync with your database.

For now, the advised way to do so, is to run graphql mutations on the playground such as:

updateUser(
  where: { firstName: null }
  data: { firstName: "DEFAULT_FIRST_NAME" } 
)

I think leveraging the power of GraphQL Bindings and Javascript to offer flexible scripted migrations could be super powerful.

In PHP using Doctrine, a migration looks like this:

/**
 * Auto-generated Migration: Please modify to your needs!
 */
class Version20180507073532 extends AbstractMigration
{
    /**
     * @param Schema $schema
     */
    public function up(Schema $schema)
    {
        // this up() migration is auto-generated, please modify it to your needs
        $this->abortIf('mysql' !== $this->connection->getDatabasePlatform()->getName(), 'Migration can only be executed safely on \'mysql\'.');

        $this->addSql('ALTER TABLE company_metadata ADD return_policy LONGTEXT DEFAULT NULL, ADD warranty_policy LONGTEXT DEFAULT NULL');
    }

    /**
     * @param Schema $schema
     */
    public function down(Schema $schema)
    {
        // this down() migration is auto-generated, please modify it to your needs
        $this->abortIf('mysql' !== $this->connection->getDatabasePlatform()->getName(), 'Migration can only be executed safely on \'mysql\'.');

        $this->addSql('ALTER TABLE company_metadata DROP return_policy, DROP warranty_policy');
    }
}

As you might have noticed, it's clearly written: Please modify to your needs!.

That means you're allowed to script a migration to your needs while leveraging the """"power""" (or not 😛) of PHP.

Use case

Let's say we introduced a new required field called allowedToAccessWebsite.

Using Javascript, that would allow us to script a migration like this:

class Version20180507073532 extends AbstractMigration
{
    async function up(db: Prisma)
    {
      const minorUsers = await db.query.Users({ where: { age_lt: 18 } });
      const majorUsers = await db.query.Users({ where: { age_gte: 18 } });

      const minorUsersIds = minorUsers.map(minorUser => minorUser.id);
      const majorUsersIds = majorUsers.map(majorUser => majorUser.id);

      await db.mutation.updateUser(
        { where: { ids_in: minorUsersIds } },
        { data: { allowedToAccessWebsite: false } }
      );
      await db.mutation.updateUser(
        { where: { ids_in: majorUsersIds } },
        { data: { allowedToAccessWebsite: true } }
      );
    }

    async function down(db: Prisma)
    {
      ...
    }
}

Those migrations should indeed have records of the schema updates as well.
The end format could look like this:

class Version20180507073532 extends AbstractMigration
{
   /* Please, do not change this */
   function schemaUpdate() {
     return {
        schema: {},
        steps: {},
        ...
     }
   }

    async function up(db: Prisma)
    {
      ...
    }

    async function down(db: Prisma)
    {
      ...
    }
}

Or even have them separated in two files, timestamped at the same date, with one file containing the schema changes, and another containing the scripts.

Coming from C# and Entity Framework to node I've really found that this is the biggest thing I miss. If Prisma could make somthing similar to Entity Framework (looks allot like what @Weakky just layed out) it would make prisma a real asset to the node community.

Thanks a lot everyone for chiming in. Migrations is a super interesting and tough problem, and we would love to learn more about your use cases to arrive at a solution that covers them.

We set up a working group to further discuss this, if you're interested I would love to have you in the #migrations channel in our Slack community.

I've written in slack channel and just now found this issue. I agree with @mavilein, cli-centered approach is a bad idea. CLI-workflow is good for prototypes and quick-starts, but for heavy continious/team development with proper workflows, CI/CD pipelines, etc, it is unsuitable.

This should be server-centric

@mavilein

There are 2 ways to store a migration in a file. We either store the concrete steps or the next project state.

It's kinda imperative vs declarative. Storing next project state (e.g. declarative way) will lack transition steps (how to go from one state to another). And this is a huge problem. One should not rely on diffing algorithms, this gives very high chances for data loss or corruption. IMO migrations should always be imperative, e.g. they should instruct server exactly how to transition to new state, rather then what the state should be.

As of slack channel is very inactive, I'l also write here. I see two ways to work with versioned migrations:

  • Versioned migrations generation: a developer has a local copy, alters schema as it designed now (changing, migrating, using temporary directives, etc). Then, before pushing to repo and CI, one runs a command that will generate a migration file(s) based on content of migrations table (in fact replicating local migration history). Maybe not very robust solution, but should work. One can locally build a migration process using temps and other tricks to alter/move/add structure and data, than "freeze" the process in migration history and target server will replicate this process
  • Opposite direction: developers use a versioned migration approach as in other frameworks and after migration server generates and outputs final datamodel as a helper for developers to see actual data state in one place

Second variant I see as more solid and controllable, but it fundamentally changes approach to data modeling and requires development of special datamodel-builder syntax and/or special cluster api for this. An example of graphql api for migrations can be something like this:

mutation {
  model(
    name: "Article", 
    addField: [{ name: "slug", default: "", index: INDEX_UNIQUE }],
    changeFeild: [{ name: "old_name", rename: "new_name", nullable: true }]
    removeField: [{name: "old_name"}]
  ) {
    status
  }
}

So the migration will be a file with mutations like this. Also this approach simplifies adding basic data: not seeds with test data, but system-critical data, like initial admin account or initial settings, etc

Without advanced migrations for migrating up AND down (inkl hooks to invoke data migrations scripts) prisma can NOT be considered production ready for any team using CI/DI. Basically this means any larger customer especially enterprise will find the lack of this a show stopper.

I don't really care about the format, but it has to be on disk, editable incl being able to make any changes to the data we need.

What is the current status and how can we help (slack channel seems to be dead)?

Is this still active? Is there something community could help?

Can we get an update on this? We're currently looking to move away from Prisma due to the lack of migrations support. Thanks.

I second @Gdewilde's sentiment. The lack of proper scripted migrations is a huge problem for any production application built on prisma right now, and is one of many issues that is causing my team to reconsider using prisma going forward.

@marktani Do you guys have any update on your thoughts here?

@schickling @sorenbs - friendly ping 😄

It would be great with confirmation that this is still planned for Q4 and being worked on. Not having migrations in Prisma is the single largest problem IMO for production applications, effectively rendering CI/CD impossible.

Agreed. This is why I'm not using Prisma on any client projects yet.

On Wed, Nov 7, 2018 at 3:34 PM jhalborg notifications@github.com wrote:

@schickling https://github.com/schickling @sorenbs
https://github.com/sorenbs - friendly ping 😄

It would be great with confirmation that this is still planned for Q4 and
being worked on. Not having migrations in Prisma is the single largest
problem IMO for production applications, effectively rendering CI/CD
impossible.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/prisma/prisma/issues/1263#issuecomment-436530895, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHJKJcTvm38WJ0nWLzxTrRS02QewF2hYks5usozqgaJpZM4QbeyN
.

--

Robby Emmert
Owner, Software Specialist @ Snowball Development, LLC
https://snowballdev.com/
LinkedIn https://www.linkedin.com/in/robbyemmert/ | @robby4g
https://twitter.com/robby4g

Thank you for all the input on this topic!

We are making progress towards a more flexible migration system, and I'd like to describe our approach.
From talking to many users of Prisma we realised that the declarative migration system is tremendously valuable to a large group of users. Thank you @terion-name for describing the difference between declarative and imperative migrations above.

Short term improvements

There are two major problems with the current declarative migration system:

  • It doesn't give enough control for complex setups where you want to run migration scripts in ci/cd
  • Even if it works well for most migrations, there are edge cases where it is not sufficiently flexible

We are currently taking steps to address both issues without abandoning the idea of declarative migrations.

First, we introduced the passive mode earlier this year allowing you to use Prisma with an existing database. We are addressing some limitations and bringing this capability to MySQL in addition to Postgres. If you need full control over database migrations or want to run scripts in ci/cd, it is a viable option to simply not use the declarative migration system.

Second, we want to make it seamless to use the declarative migration system most of the time, but opt out for specific migrations that require complex logic. This way you will be able to do most migrations like you do today, and write a sql migration when you have to. We are streamlining the datamodel in order to support this. When this work is complete, the declarative migration system will also become more flexible and for example support relations without a relation table.

Work on this is underway but we are not able to give a concrete timeline yet.

In addition to these larger changes, we are also working on small items to make the declarative migration system more useful: https://github.com/prisma/prisma/issues/2323

Long term vision

Over the past year we have seen Prisma be adopted for more different use cases than we could have imagined. It has become clear that the current monolithic design is not serving us well. In the future we will split Prisma into individual projects each with a very clear focus. Declarative migrations is likely to be the first feature that is promoted to its own project. When this has happened we will be in a better position to deliver on the vision laid out in the original feature request.

Feedback

I would love to hear your thoughts on this approach as well as specific feedback on the new datamodel describe in this spec.

Great news! Thanks for the update!

It's great to hear that this is actively being worked on! However, I'm a bit disappointed that you're leaning towards the declarative direction, with raw SQL as the 'escape hatch'/fallback. I'd much rather prefer the imperative and versioned migration model that @terion-name described, having specific migration code run on the DB for each up/downgrade of the database.

It's not usable in CI/CD if it just works 90% of the time, then i just defeats the purpose, and will not meet the requirements for larger projects. I think it should be fully controllable, meaning version for the DB, and coded migration scripts up/down for each version - and preferably not in SQL.

Or maybe I misunderstood?

@jhalborg - I like that vision too. When we have extracted the migration system to a separate project we will be able to better iterate on advanced use cases like this.
I definitely think it is possible that the Prisma migration system will evolve to support imperative, non-sql migrations as suggested by @terion-name and @schickling in the initial feature request.

@sorenbs separation of migrating system in separate project, that interacts with Prisma server via clear API is a good idea IMO, because it will open possibilities to _choose_ what migrating system suits project/setup needs.
And maybe it could even be in a pipeline manner to stack them (e.g. overlay some programmatic migration scripts over generated by declarative builder/differ)

As it is very huge amount of work and obviously will not be implemented soon, it is very important to force #2323 (please!) — this really will make life easier for many cases.

And the biggest problem for now is migrating data when structure changes. This is really a huge problem. But there are so many pitfalls with this, that I can't design any approach that would be easily tweaked in current system. Obvious part is to run some queries in process (after adding raw sql to query engine it became more useful), but how to hook in the process with this queries?

One thing that is routing in my mind is make callbacks on warnings. Now, when migrator faces data loss, it spawns warnings with certain info, what and where it sees problems. Theoretically this info can be used to form a pattern-based string, that will match a "fixing" query, list of which should be passed with schema.

For example, we change User schema by removing FullName and changing it to FirstName and LastName. Migrator creates new columns and encounters data loss due to dropping of FullName. So it searches for "fixing" or "migrating" query in payload provided, that is named something like this: User_FullName_drop. If it is found - it runs and there we can write a query, that takes data from fullName and moves it to firstName and lastName columns.

It's just a concept that obviously will not fix everything with declarative migration, but again, it can resolve a lot of situations IMO

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

The specification for the new migration system can be found here.

Declarative migrations is likely to be the first feature that is promoted to its own project

Does this mean that we will be able to use this declarative migrations library independently of Prisma?
That would be super neat. I've been looking for / considering implementing a do-one-thing-do-it-well database-agnostic migration library.

I wonder if there is an update on new migration system and if there is a release date of it.

Right now, I'm planning to create some node script to create a migration file under migrations folder, and fill it manually. Such as,

const { prisma } = require('./generated/prisma-client')
module.exports = async ()=>{
     await prisma.$graphql('mutation {executeRaw(query: "UPDATE `default@default`.User SET foo=\'bar\'")}')
}

However, this process is error tolerant and not maintainable in long-term.

I want to know if any of you have any other solutions.

I believe this is the latest update -- Lift in Prisma 2 preview:
https://www.prisma.io/blog/announcing-prisma-2-zq1s745db8i5#prisma-2-next-generation-database-tooling
https://lift.prisma.io/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

schickling picture schickling  Â·  36Comments

wereHamster picture wereHamster  Â·  37Comments

marktani picture marktani  Â·  62Comments

sorenbs picture sorenbs  Â·  43Comments

pantharshit00 picture pantharshit00  Â·  49Comments