Efcore: Bug: Seeding binary data + migration causes issues (memory + duration)

Created on 28 May 2020  路  8Comments  路  Source: dotnet/efcore

We have a report template table that holds binary data which contains the templates. We included this binary data in our seed HasData. The total size of all templates is < 2 MByte (so really not that much). We had no issues while using drop / create so far. Recently we have moved to use migrations which is when we ran into the following issues:

  • Huuuuge memory consumption

    • On my personal machine it consumed 8GB started + swapping to disk

  • Very long migration creation time

    • 10-20x compared to when out commenting the seed of binary data (5-10min depending on machine)

  • The issue does not go away and impacts any subsequent migration creation, heavily impacting the long term development experience

Steps to reproduce

  • Create an project with an EF Context
  • Create an entity class with a byte[] column
  • Use HasData to seed the entity with data
  • Create initial migration (this already takes long + lots of memory)
  • Create second migration right after (this is where time length for creation + memory usage explodes)
  • This goes also for any migration created after the second
  • This means this is a

If needed I will be providing a repo, but thought to let you guys know right away.

Further technical details

EF Core version: 3.1.3
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: 3.1
Operating system: Windows
IDE: VS 2019, but should not matter as dotnet cli with ef extensions were used to create migrations

Workaround
If anybody runs into this while its not fixed here is the workaround that worked for us:

  • We only seed this table during initial creation, no updates down the road planned (yet)
  • We have removed seeding binary data from HasData calls
  • Created the inital migration without seeding the binary columns
  • We then manually edited the generated migration to include the binary data
  • This way performance / memory usage stay within the usual limits during migration creation
  • Any further migration are also not impacted by our manual edit
area-model-building customer-reported type-enhancement

All 8 comments

@ntziolis After discussion with the team, we think using seed data for this is probably not the way to, since it's not designed for large amounts of binary data. Your workaround seems okay for now. I'm putting this on the backlog to consider making binary data more efficient here.

Understood and totally ok with using the workaround path. In fact we would be totally ok with ef core not supporting binary data seeding at all.

Just want to call out the following again:

  • We are talking about 1.4 Megabytes of data in total already causing this, so rather small amounts when talking binary data
  • Meaning likely anybody that does any binary data seeding will run into this issue
  • We had a fairly complex model when switching to EF migrations from drop / create and first suspected certain cycle relationships causing issues. So it took us quite a while to pinpoint it to seeding binary data.

To let others not suffer the same fate I would suggest:

  • Update the documentation with a warning that seeding binary data while generally possible is not recommended.
  • (Even Better) ef core could spit out a warning when trying to generate migrations that contain a column with binary data value, that seeding binary columns is not recommended?

Coming from #23118, @roji recommended me to go with custom initialization logic as per the documentation this is fine and this is something I'm trying to do now.

There is one case I am unsure of how to handle though, and thats circular references. When using the HasData methods I could go with creating anonymous objects and manually add Id columns to resolve this. But how would I go about it in custom initialization logic where I don麓t have access to the HasData method with the object overload?

@andrejohansson one common way is to just set the IDs yourself before saving your entities.

@roji won麓t that get me a constraint exception when saving the first entity since the second one is not saved yet (chicken and egg)? I'll try...

@andrejohansson Typically you would use navigation properties to define relationships, as is normal with EF. If you want to use FK values explicitly, then you can, and as long as the FKs are mapped (as is normal), then EF will order the updates appropriately.

Depending on the specific database type (and how you start your transactions), it may be possible to defer the constraint checking until the transaction is committed. Or if the table(s) are only in use by the seeding logic at that point, you can temporarily turn off constraints before seeding and reinstate afterwards.

Or you can do whatever was working for you when seeding - if you were adding a new column to the table (including the constraint), that should work without seeding as well...

Was this page helpful?
0 / 5 - 0 ratings