Graphql-engine: managing seed or data-only migrations

Created on 26 Jun 2019  ยท  16Comments  ยท  Source: hasura/graphql-engine

Often one would like to add some data into tables as part of the db init process. We should document how to achieve that.

  • head to run SQL
  • add the insert sql statement
  • hit ~track this~ This is a migration
  • Run will insert data and create a migration file

Edit:
The above task is done. There are further suggestions for managing seed data migrations in the comments. Leaving this issue open for that discussion

cli enhancement wip

Most helpful comment

@marionschleifer Thank you for the prompt response :)

You can add the SQL (with insert statements) as a migration.

If I understand what you mentioned clear, then migrations would look like this, where 1565610380000_seed.up.sql contains all INSERT statements. The version 1565610380000 of 1565610380000_seed.up.sql is just arbitrarily chosen number as the file is not generated by hasura, but by me.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
    โ”œโ”€โ”€ 1565510062048_init.up.sql
    โ”œโ”€โ”€ 1565510062048_init.up.yaml
    โ”œโ”€โ”€ 1565610349244_alter_table_public_author_add_column_test.down.yaml
    โ”œโ”€โ”€ 1565610349244_alter_table_public_author_add_column_test.up.yaml
    โ”œโ”€โ”€ 1565610379357_alter_table_public_author_alter_column_test.down.yaml
    โ”œโ”€โ”€ 1565610379357_alter_table_public_author_alter_column_test.up.yaml
    โ””โ”€โ”€ 1565610380000_seed.up.sql

The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.

Let's say I add a new migration named new_migration_example with version 1565610380001 like this.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
... (previous versions are omitted for brevity)
    โ”œโ”€โ”€ 1565610380000_seed.up.sql
    โ”œโ”€โ”€ 1565610380001_new_migration_example.up.yaml
    โ””โ”€โ”€ 1565610380001_new_migration_example.down.yaml

While I want to just simply execute hasura migrate apply as like before adding seed, now I have to run hasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified like hasura migrate apply --version 1565610380001 --version 1565610380120 --version ... (I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple command hasura migrate apply. This is cumbersome and error-prone as well.

In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides prisma seed command, independent from prisma deploy, which is similar to hasura migration.

Therefore I suggest a new way.

Users add and manage 'seed sql files' under seeds directory like the below.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
    โ”œโ”€โ”€ 1565510062048_init.up.sql
    โ””โ”€โ”€ 1565510062048_init.up.yaml
โ””โ”€โ”€ seeds
    โ””โ”€โ”€ 0_foo.sql
    โ””โ”€โ”€ 1_bar.sql

And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.

hasura seed 

What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to /hasura-seeds would execute hasura seed AFTER auto-applying migration.

# other options like `-p` are omitted for brevity
docker run \
       -v hasura/migrations:/hasura-migrations \
       -v hasura/seeds:/hasura-seeds \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Or, rather than attaching two volumes, just attaching a single directory, which contains migrations and seeds directory, would be cool as well.

docker run \
       -v hasura:/hasura \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing docker-compose up.

So, this will allow users to easily build local development workflow, while still providing easy migration to production.

How do you feel?

All 16 comments

@marionschleifer Currently I seed data by a pipeline that waits until hasura becomes ready (,which means migration of shema is done, so I can insert to database,) and then executes INSERT statements.

Could I ask

  1. When would this task (#2431) be completed?
  2. Is there a way for even current version of hasura to execute a sql file including INSERT statement for seed data? So that I can remove an external script and delegate the job to hasura?

@jjangga0214

  1. This task will be completed within the next few weeks, until mid September.
  2. Yes, you can do as described in the issue description: You can add the SQL (with insert statements) as a migration.
    Let me know if you have any further questions ๐Ÿ™‚

@marionschleifer Thank you for the prompt response :)

You can add the SQL (with insert statements) as a migration.

If I understand what you mentioned clear, then migrations would look like this, where 1565610380000_seed.up.sql contains all INSERT statements. The version 1565610380000 of 1565610380000_seed.up.sql is just arbitrarily chosen number as the file is not generated by hasura, but by me.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
    โ”œโ”€โ”€ 1565510062048_init.up.sql
    โ”œโ”€โ”€ 1565510062048_init.up.yaml
    โ”œโ”€โ”€ 1565610349244_alter_table_public_author_add_column_test.down.yaml
    โ”œโ”€โ”€ 1565610349244_alter_table_public_author_add_column_test.up.yaml
    โ”œโ”€โ”€ 1565610379357_alter_table_public_author_alter_column_test.down.yaml
    โ”œโ”€โ”€ 1565610379357_alter_table_public_author_alter_column_test.up.yaml
    โ””โ”€โ”€ 1565610380000_seed.up.sql

The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.

Let's say I add a new migration named new_migration_example with version 1565610380001 like this.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
... (previous versions are omitted for brevity)
    โ”œโ”€โ”€ 1565610380000_seed.up.sql
    โ”œโ”€โ”€ 1565610380001_new_migration_example.up.yaml
    โ””โ”€โ”€ 1565610380001_new_migration_example.down.yaml

While I want to just simply execute hasura migrate apply as like before adding seed, now I have to run hasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified like hasura migrate apply --version 1565610380001 --version 1565610380120 --version ... (I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple command hasura migrate apply. This is cumbersome and error-prone as well.

In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides prisma seed command, independent from prisma deploy, which is similar to hasura migration.

Therefore I suggest a new way.

Users add and manage 'seed sql files' under seeds directory like the below.

โ”œโ”€โ”€ config.yaml
โ””โ”€โ”€ migrations
    โ”œโ”€โ”€ 1565510062048_init.up.sql
    โ””โ”€โ”€ 1565510062048_init.up.yaml
โ””โ”€โ”€ seeds
    โ””โ”€โ”€ 0_foo.sql
    โ””โ”€โ”€ 1_bar.sql

And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.

hasura seed 

What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to /hasura-seeds would execute hasura seed AFTER auto-applying migration.

# other options like `-p` are omitted for brevity
docker run \
       -v hasura/migrations:/hasura-migrations \
       -v hasura/seeds:/hasura-seeds \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Or, rather than attaching two volumes, just attaching a single directory, which contains migrations and seeds directory, would be cool as well.

docker run \
       -v hasura:/hasura \
       hasura/graphql-engine:v1.0.0-alpha42.cli-migrations

Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing docker-compose up.

So, this will allow users to easily build local development workflow, while still providing easy migration to production.

How do you feel?

@rikinsk @shahidhk Based on @jjangga0214 comments, do you think we can solve this by allowing a user to run an arbitrary migration/SQL _without_ running it as a migration which updates the migration state on the database?

$ hasura migrate apply --one-off /path/to/folder

$ hasura migrate apply --one-off /path/to/file.sql

This will make it easy to run a seed data style migration, and we don't need to explicitly code in a seed directory into the CLI command flow?

Why isn't executing an insert SQL after the migrations are run good enough? Any reason for this to be a migration if its not to migrate from one system to another

edit: I might have responded without understanding the solutions and requirements properly. @coco98 solution is basically same as what I was going for. But I dont see how it would help to add the seed data automatically at start up like @jjangga0214's solution

Any news?

For the CLI migrations image, we can provide another environment variable HASURA_GRAPHQL_MIGRATIONS_SEED_DIR where the user can keep all the seed data as Migrations.

Then use can keep one directory for schema/metadata migrations and another for see data.

I'd like to suggest an additional proposal for seed data format.

I've seen folks have some different approaches for seeding.
For instance, some prefer SQL, while others, GraphQL Mutation.
What's more, some people have their raw seed data in JSON,
so they want to use it as-is or only with little modification.

JSON

Screenshot from 2019-12-16 20-25-08
Screenshot from 2019-12-16 20-27-32
Screenshot from 2019-12-16 20-30-25

SQL

Screenshot from 2019-12-16 20-29-35

GraphQL mutation

Screenshot from 2019-12-16 20-31-05

(This guy actually thinks json would be good as well, though)
Screenshot from 2019-12-16 20-30-56

I believe these comments are just a tip of real(or potential) needs for seed, with various approaches (at least under these 3 categories).

Thus, I think all of these 3 forms, JSON, SQL, GraphQL Mutation, are good to be used as data format for automatic seed.

I'd like to suggest an additional proposal for seed data management.

hasura seed apply # or `hasura seed push`?

This command reads seed data, and inserts it to Postgres.
This would be convenient when editing seed data frequently, and don't want to reboot or directly handle Hasura and Postgres.
If Postgres already has some data, an user can choose (e.g. by flag) between

  1. Delete existing data, and then insert the seed.
  2. Just insert the seed regardless of the existing data. Override existing data by seed if unique key is overlapped.
hasura seed create # or `hasura seed pull`?  

This command reads existing data from Postgres, and generates seed data with format the user wants (e.g by flag), possibly one of SQL, JSON, GraphQL. It should probably write the seed data on file system, or print by stdout for pipeline.
This would be convenient when an user prefers making seed data on the console, or wants to convert seed data format(.e.g. GraphQL to JSON).

@jjangga0214 thank you for your observations. We have labelled it as ideas and let's see what the community thiks about this. Every contribution is welcome ๐Ÿ™‚

I also am in the process of trying to come up with a solution for this at the moment.
Manually writing SQL statement seeds is technically an answer, sure, but that is going to cost our team a lot of time. Ideally, Hasura should provide utilities to track and insert records along with schema definitions.

I think the optimum solution here is to use the pg_dump API (which I had no clue existed in Hasura until I went digging for answers in the repo code).

My initial thought was to implement calling the pg_dump binary externally as part of the migration process, but I can see now that this has already been implemented :sweat_smile:

Have a look at this:

image

I figure the way it would work is that either through CLI flags or the Console, users can select whether they want to track just the schema, or the data plus the schema in migrations.

This may be a bit of an undertaking, because it is going to require (I assume):

  • Integration into the Haskell engine core and test suite
  • Integration into the Web console
  • Integration into the Go CLI

A short-term solution would be to modify the Go CLI to accept a flag for whether or not it should dump the data as well:

Modify migrate_create.go and schema_dump.go to take an --include-data (or similarly-termed) flag. Modify ExportSchemaDump argument string to dynamically generate the array of flags based on whether or not --include-data was passed.

https://github.com/hasura/graphql-engine/blob/f8ffbda11897983adffd5a5d00f077da92f618ba/cli/commands/migrate_create.go#L29
https://github.com/hasura/graphql-engine/blob/f8ffbda11897983adffd5a5d00f077da92f618ba/cli/commands/migrate_create.go#L85-L98

https://github.com/hasura/graphql-engine/blob/f8ffbda11897983adffd5a5d00f077da92f618ba/cli/migrate/database/hasuradb/schema_dump.go#L8-L18

I am not sure who best to tag about this. @marionschleifer @coco98

I do not have any experience writing Go, but here is what I imagine the function would roughly look like (barring the fact that if we are going to pass config here, it should be in an object (I suppose Go calls them interfaces)):

```go
func (h *HasuraDB) ExportSchemaDump(schemaNames []string, includeData bool) ([]byte, error) {
// For whatever it is worth, I would actually pass these flags as
// --no-owner and --no-acl because it carries more immediate semantic meaning.
// I had to look these flags up in pg_dump documentation to find out what they did
opts := []string{"--no-owner", "--no-acl"}

 if includeData {
     opts = append(opts, "--inserts")
 } else {
    opts = append(opts, "--schema-only")
 }

 // Probably want to add in a []string for tableNames here as well
 // Then the implementation would look like:
 if tableNames {
    for _, table := range tableNames {
        opts = append(opts, "-t", table)
    }
 }

for _, s := range schemaNames { 
    opts = append(opts, "--schema", s) 
 }

query := SchemaDump{ 
    Opts:        opts, 
    CleanOutput: true, 
} 
resp, body, err := h.sendSchemaDumpQuery(query)

}```

@shahidhk could you please help @GavinRay97 to get started on this? โœจ

@lexi-lambda This has cross-cutting concerns with https://github.com/hasura/graphql-engine/issues/2817 and would potentially take care of that too

@joshuarobs

I added functionality for seed scripts into the CLI, but have some minor details to work out. Have not heard back from the Hasura team on this so it may be the case they are either on holiday or its not super high on roadmap.

Never written Go before and unsure how much more needs to be done on it to properly integrate it, so no promises if/when it will get merged into core. Would be really neat if it did though.

image

image

image

I would actually love an option where I can update the seed data in the hasura console tab. Adjust the row and have the opportunity after I've updated a couple of rows to save them as a migration. So instead of writing for every small change a new SQL statement I would prefer to take a database snapshot from time to time.

Any ideas?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tirumaraiselvan picture tirumaraiselvan  ยท  3Comments

macalinao picture macalinao  ยท  3Comments

codepunkt picture codepunkt  ยท  3Comments

sachaarbonel picture sachaarbonel  ยท  3Comments

jjangga0214 picture jjangga0214  ยท  3Comments