Redwood: New Feature Proposal: Data Migrations

Created on 29 Jun 2020  路  3Comments  路  Source: redwoodjs/redwood

We have database migrations via Prisma Migrate, but no concept of data migrations鈥攏eeding to change the data in the database programmatically (not the database structure itself) during a deploy.

What is a Data Migration?

Let's say I have a blog with a Post model. Post has a tags field where we store a comma separated list of tags. After a while we decide that tags should really have their own model, Tag, so we want a one-to-many relationship from post to its associated tags. How do we migrate the existing tags data from Post to some associated Tags? While there is a database structure change (creation of the Tag model) there's no easy, repeatable, programmatic way to codify this change into the codebase similar to database migrations. This process is a data migration.

Before:

model Post {
  id     Int  @default(autoincrement()) @id
  author String
  body   String
  tags   String
}

After:

model Post {
  id     Int  @default(autoincrement()) @id
  author String
  body   String
  tags   Tag[]
}

model Tag {
  id     Int  @default(autoincrement()) @id
  name   String
  postId Int
  post   Post @relation(fields: [postId], references: [id])
}

How would we do this today?

A couple of possibilities:

  • Manually move the data around in the database 馃槵 Not recommended!
  • Create a script and run it on your local development machine while schema.prisma is pointed to the production database. Also 馃槵 although might be fine for a single developer project (no chance of conflicts with another dev doing something at the same time)
  • Create a script (similar to api/prisma/seeds.js) and modify the yarn build script to include executing this script so that it runs by the build system on the next deploy (and either make the script idempotent or immediately remove that script and commit again so it doesn't run a second time).

How we should do this going forward

It's probably no surprise that I'll be referring to a Ruby library that handles data migrations for Rails: https://github.com/ilyakatz/data-migrate It behaves very similarly to the database migrations we know and love in Prisma:

  1. The developer creates a JS script (a data migration) that has access to our db object (instance of PrismaClient) and that can do whatever it wants with the database. The script is named with some unique identifier (the current timestamp at time of creation works great), something like 20200629120000-migrate-tags.js
  2. A database table is created (if it doesn't already exist) which tracks which of these data migrations have already run (the timestamp in the name is saved in a row in the table). If we want to match what Prisma does, this could be called _DataMigration.
  3. On each deploy the data migrations process starts, looks at the contents of the table, checks if there are any NEW data migration scripts which aren't in that table, and runs them one at a time and adds their timestamp to the table. The process stops if any one of them exits with anything other than 0 (but does not rollback鈥攅ach individual script should be self-contained and not depend on anything that runs after it).

Command line tools

We could have a couple command line tools:

yarn rw install dataMigrations

Would append the required model to schema.prisma and run db save and db up automatically.

yarn rw g dataMigration [name]

Would create the shell of a new data migration script. I'd propose these live in api/prisma/dataMigrations unless that could cause some kind of conflict with what Prisma expects the prisma directory to contain.

Posts and Tags

So going back to our original example, the process would then be:

  1. Generate a database migration which adds the Tag model
  2. Generate and write a data migration that copies all existing tags from Post and distributes them to their own entries in Tag
  3. Update the api and web sides to use the new data structure
  4. Deploy
  5. Generate a database migration which removes the now unused Post.tags field
  6. Deploy

Any data migration that involves removing a database column will need to be a two-step deploy process like this鈥擨 definitely don't think we want to try and get in the middle of the Prisma migration logic and try to insert our own code in the middle of their process that runs the database migrations.

Code/Data timing issues during deploy

Similar to database migrations, large apps need to take into account what will happen if a user happens to come to the site while in the middle of a data migration鈥攖he JS code they receive may not work with the database/data migration that was just applied and could cause errors trying to access data. I'm not sure if anyone has come up with a good solution to this problem yet, but data migrations don't introduce any new problems that don't already exist with database migrations.

Conclusion

I'm coming across this problem myself as I'm building a Redwood app and have run into it in pretty much every app I've worked on previously. I'd love it if Redwood made this as easy as everything else!

kinimprovement deployment prisma

Most helpful comment

Fine, I'll do it myself

All 3 comments

Fine, I'll do it myself

@cannikin I _love_ separating changes to database structure from changes to _data_ in the database.

My one critique here would be the name. The distinction between a "data migration" and "data _base_ migration" is really subtle.

I'm not sure what to propose as an alternate name. Perhaps "data transition"?

It's not a big deal, but I think it might trip up beginners.

@jessmartin I agree things like this are worth bringing up! We spend a lot of time explaining what things mean and how things are distinct.

How do you feel about "data migrate"? Also, there's a decent chance this will end up becoming a _part_ of the Prisma migration flow... so I believe that's one reason for the similarity. (If I'm correct.)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

peterp picture peterp  路  4Comments

thedavidprice picture thedavidprice  路  3Comments

thedavidprice picture thedavidprice  路  3Comments

jeliasson picture jeliasson  路  3Comments

tmeasday picture tmeasday  路  4Comments