Efcore: Seed Data

Created on 4 Sep 2014  路  73Comments  路  Source: dotnet/efcore

In EF6, this was accomplished using DbMigrationsConfiguration.Seed(), but migrations configurations don't exist in EF7.

closed-fixed punted-for-2.0 type-enhancement

Most helpful comment

Proposal is to have a Seed.cs file in the Migrations directory (alongside model snapshot etc.). We could scaffold this when the first migration is added:

public class BloggingContextSeedData : DatabaseSeedData<BloggingContext>
{
    public override void Seed(BloggingContext context)
    { 
    }
}

This is essentially the same as the Seed method on EF6.x. While this approach has it's issues it seems to work pretty well and we haven't heard a lot of complaints.

We could also look at some simple Insert/Update/Delete APIs in Migrations. This would give folks a more advanced option that overcomes some of the limitations of Seed():

  • Removes the need to write AddOrUpdate code since the data is inserted once during the migration (and presumably deleted during the down).
  • Removes the need to update the Seed code whenever the model changes (since it is applied at a well known point in time of the schema.

Of course, data in migrations would be loosely typed since it wouldn't use the current model.

All 73 comments

Proposal is to have a Seed.cs file in the Migrations directory (alongside model snapshot etc.). We could scaffold this when the first migration is added:

public class BloggingContextSeedData : DatabaseSeedData<BloggingContext>
{
    public override void Seed(BloggingContext context)
    { 
    }
}

This is essentially the same as the Seed method on EF6.x. While this approach has it's issues it seems to work pretty well and we haven't heard a lot of complaints.

We could also look at some simple Insert/Update/Delete APIs in Migrations. This would give folks a more advanced option that overcomes some of the limitations of Seed():

  • Removes the need to write AddOrUpdate code since the data is inserted once during the migration (and presumably deleted during the down).
  • Removes the need to update the Seed code whenever the model changes (since it is applied at a well known point in time of the schema.

Of course, data in migrations would be loosely typed since it wouldn't use the current model.

BTW if we do Seed.cs we'll probably want something like the AddOrUpdate method. If so, we should consider the ability to whitelist properties to be updated, as discussed in https://github.com/aspnet/EntityFramework/issues/453.

We should be able to add data along with a specific migration. For example, I add a new table and then add some data to it right after the table is created.
This is really important for domain data, that mainly changes when some app structure changes. An example is menu records, where I mostly only add a new menu record when I add a new feature.
So, it would be great to have some methods to add/change data on the MigrationBuilder class.

@giggio Totally agree, having some sort of first class data manipulation API would be great. This issue is specifically about expressing some seed data in terms of the current model, but your scenario would also be very good to enable. The data migration would need to be weakly typed rather than using the model (since the model will change over time).

How is Seed method handled in EF7?

protected override void Seed(MyContext context){}

@hbopuri - There isn't a first class API for this at the moment (though this work item is tracking adding one in the future). Currently you would just write code in app startup to handle seed data.

@rowanmiller wasn't __override seed__ part of earlier EF versions? is EF7 a upgraded version of earlier versions or a completely newly designed from scratch?

HaBo Rocks
Harsha Bopuri

@hbopuri - It's a major version and has breaking changes in it. These breaking changes are larger than we would typically do in a release, but you can read more about the reasoning behind what we are doing here http://blogs.msdn.com/b/adonet/archive/2014/10/27/ef7-v1-or-v7.aspx.

@rowanmiller Will this feature ship with the RTM? And totally agree with @giggio suggestion.

@heathyates - Not sure if we'll get a seed method equivalent into initial RTM. It is super easy to write the logic in your app start code in the meantime.

@rowanmiller - It is easy to write a seed method using the app startup code. However, all the samples that I could find incorrectly use EnsureCreatedAsync and as a result fail to load the seed data. What is missing is a way to determine if the DbContext is correctly configured and has the model applied that matches the seed data requirement. This could be solved by having a way to detect if a particular Migration has been applied to the datastore.

@jwdavidson I have not experienced a failure to load the seed data and I use EnsureCreatedAsync. Has the mvc music store failed for you? I'm not implying I don't believe you or your claim, but confused as to what samples are not working for you? Would be interested in duplicating what you have observed myself. :-)

@jwdavidson do you want to apply migrations during startup to ensure the database is up to date? If so, context.Database.AsRelational().ApplyMigraions() is the API to do this.

@heathyates - The case where EnsureCreatedAsync is when it returns true, which it does when the database was not previously created and/or the initial migration has not be loaded to the database. After that if you want to load seed data which uses a second migration then it will fail as EnsureCreatedAsync will return false.

@rowanmiller - the suggested ApplyMigrations() process may enable the seed function to be reliably processed from the startup code. I will try that tomorrow and let you know.

@rowanmiller - the following code block seems to do exactly what I want: ensure the database exists and the relevant migrations have been applied

using (DbContext db = (DbContext)serviceProvider.GetService<ApplicationDbContext>()) {
    if (!db.Database.AsRelational().Exists()) {
        db.Database.AsRelational().Create();
    }
    db.Database.AsRelational().ApplyMigrations();
}

After this any data insertion routines may be safely added. Thanks.

@jwdavidson you should be able to remove the existence check and creation. ApplyMigrations() will take care of creating the database if it doesn't exist.

using (DbContext db = (DbContext)serviceProvider.GetService<ApplicationDbContext>()) {
    Database.AsRelational().ApplyMigrations();
}

@rowanmiller - Yes, that works as advertised, just not used to doing something like this without an "if" validation. Thanks again

@weitzhandler good discussion on the best pattern for the moment going on at https://github.com/aspnet/EntityFramework/issues/3070

Clearing the milestone after talking about this with @DamianEdwards and @rowanmiller. It is really hard to find the right place to do seeding, in particular in an ASP.NET application. We should at least consider trying to have a solution for RTM.

I Always use seed data for system records that have identity values outside the range of the identity. Mostly I use negative values and have the identity simply start at 1. I then know which values are for what "system" data, which is also the same across all instances of the databases. As part of seeding, but also in general how could you insert a record with an identity PK with a given PK value. In effect will inserting records with -- adhoc IDENTITY INSERT ON -- be supported or what workaround would you suggest?.

@summer600 it requires a little bit of a workaround, but you can do this already - http://docs.efproject.net/en/latest/saving/explicit-values-generated-properties.html#explicit-values-into-sql-server-identity-columns. If you set a value on a PK property (i.e. it's not just the CLR default for the property type) then EF Core will attempt to write that to the database for you.

@divega

Clearing the milestone after talking about this with @DamianEdwards and @rowanmiller. It is really hard to find the right place to do seeding

Were you referring to seeding before\after each migration or even for a single seed? I'm curious to know why it is so complicated, is it not as easy as just running the DML SQL scripts after running the DDL migration scripts?

We should at least consider trying to have a solution for RTM.

(If you referred to seed per migration) I think so too, 90~% of the times I had to change the structure of my tables I also had to run DML migration scripts.

As a temporary solution, I would write the raw SQL inside the migration with

migrationBuilder.Sql(@"{{DoMagicHere}}");

That comment was referring to the fact that in EF Core we don't have a pattern or API to run seeding (single method or not) after applying migrations. That sends users looking for ways to execute the seeding logic when the application runs, but that itself has lots of issues and no generally correct solution.

A problem with seeding on startup: When having multiple app nodes, how can you avoid all app nodes trying to seed the database at once?

@geirsagberg The question is basically how to sync multiple servers.
It's a common and old problem with many viable solutions.
Using a flag in a configuration table and lock it is one solution,
Having a WebJob(Azure)/job is another.
Using redis for locking is another...
...
...

Ended up wrapping the seeding in a TransactionScope with IsolationLevel.Serializable, looking good so far. Of course the seeding is idempotent, so as long as only one node can seed at once, the seeding will only be done once.

Actually, using a TransactionScope does not work at all, because EF does not yet support ambient transactions; I get the errror An ambient transaction has been detected. Entity Framework Core does not support ambient transactions.

Sorry to jump in on this but doing seeding on startup etc ins't going to fly no one will do that in prod. In EF 6 we had DbMigrationsConfiguration and this was called post migration by update-database. The neat thing is update-database can be called as part of your continuous deployment process ensuing that your servers (think multiple instances Azure) are in the correct state to receive the update and it's done as part of migrations. I would suggest the same pattern.

To workaround this I've just been adding migrations to do seed data so I know they will be called by update-database during CD. And a good by-product is that I can rollback seed data in the same way as I do for table structure.

To workaround this I've just been adding migrations to do seed data so I know they will be called by update-database during CD. And a good by-product is that I can rollback seed data in the same way as I do for table structure.

I really like this approach, as it also means you don't have to worry about checking for existing data etc. You know your seed logic will only run once per database... and the seed data will go thru the schema transformations etc. of following migrations.

@rowanmiller I wrote it 8 months ago 馃槑
https://github.com/aspnet/EntityFramework/issues/629#issuecomment-186800686

As a temporary solution, I would write the raw SQL inside the migration

I like the idea of adding migrations to seed the DB. It makes a lot of sense for all the reasons @DalSoft mentioned.

I'm currently seeding on startup and not only does it add extra time whenever you run it, it _feels_ wrong. The vast majority of the time, the database is already seeded and it adds unnecessary overhead on startup which means it takes me extra time whenever I want to debug it (which is often).

It would be really nice to figure out a way to access the DB context from the migration, but I suppose changes to the context in the future could break past migrations, which makes that untenable. It would at least be nice to not have to write the SQL directly.

Maybe there could be an additional argument for the CreateTable function that takes an array of anonymous row objects like so:

migrationBuilder.CreateTable(
    name: "Foo",
    columns: table => new
    {
        FooID = table.Column<int>(nullable: false)
            .Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
        Name = table.Column<string>(maxLength: 50, nullable: false),
        SortOrder = table.Column<int>(nullable: false)
        /* More columns */
    },
    constraints: table =>
    {
        table.PrimaryKey("PK_Foo", x => x.FooID);
    },
    seedData: new [] {
        new {
            Name = "Bar",
            SortOrder = 1
        },
        new {
            Name = "Baz",
            SortOrder = 2
        }
    });

Then there could also be additional functions for inserting / updating / removing on the model builder using the same structure.

How would one have seed data in migrations that are also environment specific? e.g. for dev only. Or is that not really a pattern.

@DamianEdwards migrations are just code that executes, so you can easily write conditional logic to only add seed data if <insert whatever mechanism you want to switch on>.

BTW I think we do still want a higher level seed data mechanism that allows you to use your model to specify the data (and could have a native understanding of environments)... I just meant that adding seed data inline in migrations is also a good pattern with some benefits.

@rowanmiller How do you inject the environment name to the migration?

@gdoron it would be hard to get IHostingEnvironment in (as you'd have to override the components that create migrations). So the best bet would probably be to check the environment variable... not ideal, and something we could look at improving, but it would work at the moment.

@DamianEdwards @gdoron @rowanmiller I have had to solve this very problem today! I have a workaround that allows you to take the environment switch e.g. update-database -environment production and inject that into the migration. As @gdoron suggested a environment variable is the way to go, but the way I have done it works with the dotnet CLI so that it can be used ain CD pipeline. I do gist / write up tomorrow on how this is working.

Looking forward to reading that, be sure to update here once it's ready!

@atrauzzi @DamianEdwards @gdoron @rowanmiller Here is my (belated) post on how I went about getting seeding and migrations working with the environment switch.

http://www.dalsoft.co.uk/blog/index.php/2016/12/13/entity-framework-core-seeding-using-migrations/

Hi @rowanmiller, Are we going to bring this feature in near future?

@yasin-uohyd this is not currently planned for an upcoming release.

@rowanmiller Can I ask for a brief clarification on this comment?

To workaround this I've just been adding migrations to do seed data so I know they will be called by update-database during CD. And a good by-product is that I can rollback seed data in the same way as I do for table structure.

I really like this approach, as it also means you don't have to worry about checking for existing data etc. You know your seed logic will only run once per database... and the seed data will go thru the schema transformations etc. of following migrations.

This seems like our best option for now. To clarify, are you referring here to running inline SQL during migrations via the migrationBuilder?

@jammerware yes, that is correct

I collected some scenarios based on our existing EF6 data seeding implementation and our current GitHub discussions, and I'm proposing this solution that addresses most of them. It鈥檚 different from what we had, and it enables interesting scenarios that we didn鈥檛 cover before.

If you've been waiting for this feature, please give your input whether you think it's a good idea or you think for some reason it won't work for you.

Examples

Please refer to this gist. It has three examples of seed data in OnModelCreating and what the scaffolded migration looks like. Each example builds on top of the previous one showing how we would generate the required insert, update or delete instructions (based on the current snapshot).

Proposal

  • We add a SeedData (or similar) method to ModelBuilder.Entity<TEntity>() that takes a TEntity[]

    • Optionally taking an alternateKey: TEntity => property

    • Users are able to use files if they convert them to objects (e.g. Json.Deserialize)

  • We add CRUD methods to MigrationBuilder

    • This can be used standalone, but it also supports scaffolding

  • We save metadata for the rows we are tracking in the snapshot
  • We diff the seed data as part of the dotnet ef migrations add command
  • We scaffold the CRUD statements as part of the migration
  • Optionally, We teach InMemory to read seed data

    • E.g. DbContext.ReadSeedData()???

  • Optionally, we teach MigrationBuilder which environment it's running in to conditionally run seeding
  • Optionally, we add SeedDataFromJson or similar helpers

Remarks

  • This takes the benefits of static typing of Configuration.Seed and extends it with migrations support.
  • This introduces a nicer API for people who want to migrate data but don't need to keep track, who were previously using raw SQL
  • This allows to easily read seed data from a file, a common feature in other ORMs which we never had (as long as it can be converted to model entities, it can be any source or format and we're completely agnostic)
  • This allows to easily seed an InMemory database for testing by just inserting the seeded objects to the internal collections
  • This can be exported as SQL along the schema migrations for publishing

Proposed implementation steps

  • First PR adds migration operations, which can be used standalone

    • InsertRows

    • DeleteRows

    • UpdateRows

    • CreateTable.WithRows

  • Second PR adds scaffolding

    • Entity<TModel>



      • SeedData(TModel[] data, Func alternateKey = null)



    • dotnet ef migrations add



      • Diff seed data with previous snapshot


      • Update snapshot


      • Scaffold migration code with CRUD operations



  • More PRs could follow up for the optional items, which are based on these new features but aren't needed for data seeding

@tinchou This is interesting, however I don't think it addresses something that always holds me up when I think about strongly / generically typing the seeds. That is: what happens if we remove the Population property from Country? Your previous migrations will no longer compile against the new version of the Country class. Perhaps we could do it with dynamic objects where the property names match column names, but I don't see any way of getting around this issue and keeping it strongly-typed.

@rmarskell there are two things happening here:

  1. We have strongly-typed data in the model, which you must keep up to date in order to compile
  2. We have anonymous objects / untyped data in the migrations themselves

Does this address your concern or am I missing something?

Ah, I see. In my head I was reading those OnModelCreating boxes as migrations. Haha. That makes now. I like this a lot.

So, in your OnModelCreating would we manually be duplicating the change we made in the seed migration or is this just two different ways to do it?

  1. OnModelCreating has the current data. If you keep it under source control then seed data will be in sync with your code.
  2. We want to scaffold the data motion operations automatically for you, as we do with schema changes. We plan to have a snapshot of your seed data and diff it to generate the operations.
  3. You can also use the API to manually migrate data, but that won't be tracked since it's not part of the model. It is similar to what people currently do with raw SQL.

Hello everyone! The feature is working pretty well now and you can give it a try :).

  1. Clone https://github.com/tinchou/entityframework
  2. Drop the contents of this zip on EF's root folder: SeedDataDemo.zip
  3. cd SeedDataDemo
  4. dotnet ef database update
  5. dotnet run

Note: Some steps will take a while since they compile EF (and not only the demo project).

Feel free to play around with it, test stuff, report bugs, etc.. Off the top of my head I can mention the following gotchas:

  • We don't support navigations
  • We need you to specify the object IDs, otherwise it'll generate an invalid temporary value

Thanks!

did u really have to change the wole thing while developping .Net Core? this is just complicating our good habit. just spent 30 mins on this while i was supposed to think about other important stuff, instead still trying.to find how to SEED DATA.....

We're very transparent on what features are missing on EF Core, and if this is an important part of your application you should consider using EF6.

EF Core is a new framework, and so we're not attached to the old design. We can keep it when it makes sense, but we can choose to change it when can be improved. I'm confident the new design will cover all previous use cases and more (many that have been raised here and in other issues). The proposal was shared very early on to ensure anyone can call out if they see any problem, and I encourage you to share any concerns you have.

For EF Core I think we should draw a line under seeding as a separate thing. And instead have a lightweight helper available in migrations that allows you to do seeding (with the functionality outlined in the proposal). That way you can have a up and a down 'seed', more importantly the seed data works like a migration avoiding the pitiful mentioned by @rmarskell if the table changes over time.

I've used the technique I outlined in my blog post (which I would now do over how I did it in EF6) in two prod deployments one of which is for a large company, and I've found extending your seed data using a migration rather than having one seed class to rule them all has paid off, as doing this I'm confident that seeding will always work no matter what environment or build I'm targeting.

@cedriclange EF Core is new rather than an iterative improvement over EF6 - hopefully once your over the initial learning curve your see that it's aiming to quicker and more lightweight than previous versions.

The problem is that migration itself has some serious issues.
For example you can't make it work if there is no start up project. A total annoyance.

@weitzhandler yes but you can work around that by creating a separate data startup project example
https://gist.github.com/DalSoft/8684d8be9566221ca7b1b8e962db241f#file-program-cs

You can't make it work if there is no start up project. A total annoyance.

Agreed. We've done everything we can in EF Core's tools to work around the limitations of .NET Standard and .NET Core class libraries, but it can still be painful at times. We'll continue to do everything we can in our tools to make the experience as delightful as possible. We work closely with to the .NET Core SDK team to let them know where we're hitting limitations.

@DalSoft Is this another project in the solution, or things added to the various current project files?
@bricelam Thanks for your sympathy!

What is the delivery forecast?

Final version? EF Core 2.
Nightly version? I'm hoping to get https://github.com/aspnet/EntityFramework/pull/7824 merged soon, I'd think no later than next Friday.

Final version? EF Core 2 ?

Yes. There will be an EF Core version 2 released sometime this year. It'll include this feature and others that we're working on. You can take a look at them if you search by issues with the filter milestone:2.0.0

Here's a solution for running the seed method during Startup as suggested using DI. This example also uses Identity. https://gist.github.com/mombrea/9a49716841254ab1d2dabd49144ec092

@DalSoft

yes but you can work around that by creating a separate data startup project example
https://gist.github.com/DalSoft/8684d8be9566221ca7b1b8e962db241f#file-program-cs

Will it work in an ASP.NET Core project using EF6.1.3? Unfortunately I can't work with EFCore just yet.

@weitzhandler the concept would work on EF6 but the code would probably need re-working slightly.

@mombrea I would strongly suggest separating seeding from the Web project. I've had problems with doing this on production and advise all my clients to use a separate project that can invoked as part of your CD pipeline, rather than by the web project directly.

@DalSoft thanks for the tip, I will look at another way of triggering the seed data. What type of problems did you see in production when it was being called during Startup?

@mombrea expected stuff longer app start.

Unexpected stuff a dev leaving a entity migration in the source and it being run on app startup. Making it part of CD enforces principle of least surprise, and you know when your migrations and seeding are taking place so you can test it as appropriate.

Yes. There will be an EF Core version 2 released sometime this year. It'll include this feature and others that we're working on. You can take a look at them if you search by issues with the filter milestone:2.0.0

What is the current solution/workaround while waiting for 2.0? Im currently using ASP.NET Core/ EF Core.

Cheers guys and understand fully this is a blank slate project 馃憤

@garfbradaz, I am using this workaround:
1 - Create a partial class with the same name of the migration class.
2 - In that partial class i create a method SeedDataBase that receives one argument of type MigrationBuilder
3 - In the migration class, in the end of the Up method, i call this.SeedDataBase(migrationBuilder)
4 - Inside the methods SeedDataBase , i call just migrationBuilder.Sql("INSERT INTO (...)");

Hope this helps.

@brunoalreis Would it be possible to use your workaround when running against two different database servers? I.E., is it possible to detect inside the migration whether your connecting to an MSSQL database vs. a PostgreSQL database?

EDIT:
Looks like I can check migrationBuilder.ActiveProvider for equality to "Microsoft.EntityFrameworkCore.SqlServer" or "Npgsql.EntityFrameworkCore.PostgreSQL"?

More EDIT:
Looks like I may actually be able to write my seed code to execute properly for both of these providers, so that's a bonus.

@jceddy, I think you can. I have two different scenarios, in two different applications. In one i am using Sqlite and in the other i am using PostgreSQL. In the Sqlite migrationBuilder.ActiveProvider has "Microsoft.EntityFrameworkCore.Sqlite" and in the PostgreSQL migrationBuilder.ActiveProvider has "Npgsql.EntityFrameworkCore.PostgreSQL". Don't know if is the best way to check database type, but it works.

@jceddy take a look at my post on EF Core seeding.

If you use my technique you can pass environment switches, and more importantly just use the normal EF context. This means you don't have to do custom SQL for each provider.

@brunoalreis @DalSoft I ended up creating an empty migration after the initial schema creation one and added code to seed data in the empty migration.

I'm a bit late to the party, but I wanted to give the EF Core team input on my requirements for seeding or DbContext data initialization. Perhaps entity SeedData could be made to work for me, but I can't wait for release 2.1.
I feel that the APIs for DbContext data initialization should be added to DbContext. A simple

DbContext.MigrateToLatestVersion( IDbContextInitializer initializerClass, IServiceProvider serviceProvider ) 

which utilizes DbContext.Database.Migrate() and after which calls the seed( dbContext, serviceProvider ) method in the initializer class.

The DbContextInitializer class then has full reign to manipulate the context and persist changes to the repository.

Reading through this issue, this looks like the initial solution proposed by @rowanmiller . It's simple and easy to implement.

Was this page helpful?
0 / 5 - 0 ratings