Often one would like to add some data into tables as part of the db init process. We should document how to achieve that.
Edit:
The above task is done. There are further suggestions for managing seed data migrations in the comments. Leaving this issue open for that discussion
@marionschleifer Currently I seed data by a pipeline that waits until hasura becomes ready (,which means migration of shema is done, so I can insert to database,) and then executes INSERT statements.
Could I ask
@jjangga0214
@marionschleifer Thank you for the prompt response :)
You can add the SQL (with insert statements) as a migration.
If I understand what you mentioned clear, then migrations would look like this, where 1565610380000_seed.up.sql contains all INSERT statements. The version 1565610380000 of 1565610380000_seed.up.sql is just arbitrarily chosen number as the file is not generated by hasura, but by me.
โโโ config.yaml
โโโ migrations
โโโ 1565510062048_init.up.sql
โโโ 1565510062048_init.up.yaml
โโโ 1565610349244_alter_table_public_author_add_column_test.down.yaml
โโโ 1565610349244_alter_table_public_author_add_column_test.up.yaml
โโโ 1565610379357_alter_table_public_author_alter_column_test.down.yaml
โโโ 1565610379357_alter_table_public_author_alter_column_test.up.yaml
โโโ 1565610380000_seed.up.sql
The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.
Let's say I add a new migration named new_migration_example with version 1565610380001 like this.
โโโ config.yaml
โโโ migrations
... (previous versions are omitted for brevity)
โโโ 1565610380000_seed.up.sql
โโโ 1565610380001_new_migration_example.up.yaml
โโโ 1565610380001_new_migration_example.down.yaml
While I want to just simply execute hasura migrate apply as like before adding seed, now I have to run hasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified like hasura migrate apply --version 1565610380001 --version 1565610380120 --version ... (I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple command hasura migrate apply. This is cumbersome and error-prone as well.
In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides prisma seed command, independent from prisma deploy, which is similar to hasura migration.
Therefore I suggest a new way.
Users add and manage 'seed sql files' under seeds directory like the below.
โโโ config.yaml
โโโ migrations
โโโ 1565510062048_init.up.sql
โโโ 1565510062048_init.up.yaml
โโโ seeds
โโโ 0_foo.sql
โโโ 1_bar.sql
And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.
hasura seed
What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to /hasura-seeds would execute hasura seed AFTER auto-applying migration.
# other options like `-p` are omitted for brevity
docker run \
-v hasura/migrations:/hasura-migrations \
-v hasura/seeds:/hasura-seeds \
hasura/graphql-engine:v1.0.0-alpha42.cli-migrations
Or, rather than attaching two volumes, just attaching a single directory, which contains migrations and seeds directory, would be cool as well.
docker run \
-v hasura:/hasura \
hasura/graphql-engine:v1.0.0-alpha42.cli-migrations
Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing docker-compose up.
So, this will allow users to easily build local development workflow, while still providing easy migration to production.
How do you feel?
@rikinsk @shahidhk Based on @jjangga0214 comments, do you think we can solve this by allowing a user to run an arbitrary migration/SQL _without_ running it as a migration which updates the migration state on the database?
$ hasura migrate apply --one-off /path/to/folder
$ hasura migrate apply --one-off /path/to/file.sql
This will make it easy to run a seed data style migration, and we don't need to explicitly code in a seed directory into the CLI command flow?
Why isn't executing an insert SQL after the migrations are run good enough? Any reason for this to be a migration if its not to migrate from one system to another
edit: I might have responded without understanding the solutions and requirements properly. @coco98 solution is basically same as what I was going for. But I dont see how it would help to add the seed data automatically at start up like @jjangga0214's solution
Any news?
For the CLI migrations image, we can provide another environment variable HASURA_GRAPHQL_MIGRATIONS_SEED_DIR where the user can keep all the seed data as Migrations.
Then use can keep one directory for schema/metadata migrations and another for see data.
I'd like to suggest an additional proposal for seed data format.
I've seen folks have some different approaches for seeding.
For instance, some prefer SQL, while others, GraphQL Mutation.
What's more, some people have their raw seed data in JSON,
so they want to use it as-is or only with little modification.
JSON



SQL

GraphQL mutation

(This guy actually thinks json would be good as well, though)

I believe these comments are just a tip of real(or potential) needs for seed, with various approaches (at least under these 3 categories).
Thus, I think all of these 3 forms, JSON, SQL, GraphQL Mutation, are good to be used as data format for automatic seed.
I'd like to suggest an additional proposal for seed data management.
hasura seed apply # or `hasura seed push`?
This command reads seed data, and inserts it to Postgres.
This would be convenient when editing seed data frequently, and don't want to reboot or directly handle Hasura and Postgres.
If Postgres already has some data, an user can choose (e.g. by flag) between
hasura seed create # or `hasura seed pull`?
This command reads existing data from Postgres, and generates seed data with format the user wants (e.g by flag), possibly one of SQL, JSON, GraphQL. It should probably write the seed data on file system, or print by stdout for pipeline.
This would be convenient when an user prefers making seed data on the console, or wants to convert seed data format(.e.g. GraphQL to JSON).
@jjangga0214 thank you for your observations. We have labelled it as ideas and let's see what the community thiks about this. Every contribution is welcome ๐
I also am in the process of trying to come up with a solution for this at the moment.
Manually writing SQL statement seeds is technically an answer, sure, but that is going to cost our team a lot of time. Ideally, Hasura should provide utilities to track and insert records along with schema definitions.
I think the optimum solution here is to use the pg_dump API (which I had no clue existed in Hasura until I went digging for answers in the repo code).
My initial thought was to implement calling the pg_dump binary externally as part of the migration process, but I can see now that this has already been implemented :sweat_smile:
Have a look at this:

I figure the way it would work is that either through CLI flags or the Console, users can select whether they want to track just the schema, or the data plus the schema in migrations.
This may be a bit of an undertaking, because it is going to require (I assume):
A short-term solution would be to modify the Go CLI to accept a flag for whether or not it should dump the data as well:
Modify migrate_create.go and schema_dump.go to take an --include-data (or similarly-termed) flag. Modify ExportSchemaDump argument string to dynamically generate the array of flags based on whether or not --include-data was passed.
https://github.com/hasura/graphql-engine/blob/f8ffbda11897983adffd5a5d00f077da92f618ba/cli/commands/migrate_create.go#L29
https://github.com/hasura/graphql-engine/blob/f8ffbda11897983adffd5a5d00f077da92f618ba/cli/commands/migrate_create.go#L85-L98
I am not sure who best to tag about this. @marionschleifer @coco98
I do not have any experience writing Go, but here is what I imagine the function would roughly look like (barring the fact that if we are going to pass config here, it should be in an object (I suppose Go calls them interfaces)):
```go
func (h *HasuraDB) ExportSchemaDump(schemaNames []string, includeData bool) ([]byte, error) {
// For whatever it is worth, I would actually pass these flags as
// --no-owner and --no-acl because it carries more immediate semantic meaning.
// I had to look these flags up in pg_dump documentation to find out what they did
opts := []string{"--no-owner", "--no-acl"}
if includeData {
opts = append(opts, "--inserts")
} else {
opts = append(opts, "--schema-only")
}
// Probably want to add in a []string for tableNames here as well
// Then the implementation would look like:
if tableNames {
for _, table := range tableNames {
opts = append(opts, "-t", table)
}
}
for _, s := range schemaNames {
opts = append(opts, "--schema", s)
}
query := SchemaDump{
Opts: opts,
CleanOutput: true,
}
resp, body, err := h.sendSchemaDumpQuery(query)
}```
@shahidhk could you please help @GavinRay97 to get started on this? โจ
@lexi-lambda This has cross-cutting concerns with https://github.com/hasura/graphql-engine/issues/2817 and would potentially take care of that too
@joshuarobs
I added functionality for seed scripts into the CLI, but have some minor details to work out. Have not heard back from the Hasura team on this so it may be the case they are either on holiday or its not super high on roadmap.
Never written Go before and unsure how much more needs to be done on it to properly integrate it, so no promises if/when it will get merged into core. Would be really neat if it did though.



I would actually love an option where I can update the seed data in the hasura console tab. Adjust the row and have the opportunity after I've updated a couple of rows to save them as a migration. So instead of writing for every small change a new SQL statement I would prefer to take a database snapshot from time to time.
Any ideas?
Most helpful comment
@marionschleifer Thank you for the prompt response :)
If I understand what you mentioned clear, then migrations would look like this, where
1565610380000_seed.up.sqlcontains allINSERTstatements. The version1565610380000of1565610380000_seed.up.sqlis just arbitrarily chosen number as the file is not generated by hasura, but by me.The reason I want seed data is to easily populate data on local development environment. So seed data must be prevented to be fed to production database in our case. But satisfying this with migration model introduces, at least to my project, an inconvenient workflow.
Let's say I add a new migration named
new_migration_examplewith version1565610380001like this.While I want to just simply execute
hasura migrate applyas like before adding seed, now I have to runhasura migrate apply --version 1565610380001. What if I have many versions to be applied? Then every version would need to be specified likehasura migrate apply --version 1565610380001 --version 1565610380120 --version ...(I haven't tested whether specifying multiple versions on one command works. If not, then multiple commands would be required.), rather than a simple commandhasura migrate apply. This is cumbersome and error-prone as well.In my opinion, "seed" should be an independent concept from migration. That's because migration can be applied to both dev and production environment, whereas seed, at least in my case, should only be applied to dev environment. For example, prisma provides
prisma seedcommand, independent fromprisma deploy, which is similar to hasura migration.Therefore I suggest a new way.
Users add and manage 'seed sql files' under
seedsdirectory like the below.And the command below would execute every .sql files in 'seeds' folder in alphanumeric order.
What's more, when using graphql-engine.cli-migrations docker image, attaching the seeds directory as a volume to
/hasura-seedswould executehasura seedAFTER auto-applying migration.Or, rather than attaching two volumes, just attaching a single directory, which contains
migrationsandseedsdirectory, would be cool as well.Applying seed by graphql-engine.cli-migrations matters. Because I frequently remove docker volumes, and want to re-initialize postgres and hasura with the desired "state"(metadata, schema, and seed data) by just executing
docker-compose up.So, this will allow users to easily build local development workflow, while still providing easy migration to production.
How do you feel?