Efcore.pg: Auto-increment value collision with data seeding

Created on 19 Apr 2018 · 32Comments · Source: npgsql/efcore.pg

Hi,

If i migrate database without seeding any data, there is no problem in auto-increment system.
But if i do data seeding, (after the EF Core 2.1 Preview2 update) we must fill "id" values and when i seed data with id values, it breaks auto-increment system in PostgreSql. To fix this problem, i have to alter all of the sequences about the tables that have been sent data.

You can see the error messages below;

This is the error message which has been created before seeding data without id values.
The seed entity for entity type 'Student' cannot be added because there was no value provided for the required property 'StundentId'.

And this is the error message from PostgreSql after seeding data with id values when i want to add another row on table
[23505] ERROR: duplicate key value violates unique constraint "PK_Student" Detail: Key ("StundentId")=(3) already exists.

```
public class Student
{
[Key]
[Required]
public int StundentId { get; set; }
[Required]
public int StundentNo { get; set; }
[Required]
public string Name { get; set; }
[Required]
public string Surname { get; set; }
[Required]
public Course course { get; set; }
}

public class Course
{
    [Key]
    [Required]
    public int ClassId { get; set; }
    [Required]
    public string ClassName { get; set; }
    [Required]
    public string Section { get; set; }
    [Required]
    public List<Student> Students { get; set; }
}

public class StudentDBContext : DbContext
{
    public StudentDBContext(DbContextOptions options) : base(options)
    {
    }
    public DbSet<Student> Student { get; set; }
    public DbSet<Course> Class { get; set; }
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Course>().HasMany(p => p.Students);
        modelBuilder.Entity<Student>().HasData(
            new { StundentId = 3, StundentNo = 2243, Name = "paul", Surname = "smith", courseClassId = 1 },
            new { StundentId = 4, StundentNo = 2244, Name = "allen", Surname = "martin", courseClassId = 1 }
            );
    }
}

```

enhancement

Source

ykirkanahtar

👍6

Most helpful comment

For everyone running into this, version 3.0 of the provider add support for specifying index options on identity columns. That means it's now very easy to specify the starting ID of an identity column in your model - you can just set it to be higher than all your seeded values.

See this new section in the docs: http://www.npgsql.org/efcore/modeling/generated-properties.html#identity-sequence-options

roji on 29 Jan 2020

👍5

All 32 comments

@ykirkanahtar I understand the issue and what you say makes sense.

First, it's worth mentioning that unless you have relationships, you should be able to leave the id empty and the database should generate it for you like any normal insert. In this scenario you shouldn't run into any issues. But your scenario does include a relationship.

Second, this is a general question and not specific to Npgsql/PostgreSQL in any way - it should be easily reproducible in SQL Server, Sqlite, etc. So I'm going to close it for now, please reopen it on github.com/aspnet/EntityFrameworkCore and we'll see what the team says. If it turns out that there's an Npgsql-specific aspect here, we'll reopen.

Finally, a good workaround (and possibly the only answer) could be to manually set up a sequence with the correct starting value (i.e. bigger than any existing seeded value), and then set up your id column to get its values from that. This would make your model more explicit and complicated, but it would work. You can see how to create sequences with a starting value in the EF Core docs.

roji on 28 Apr 2018

@roji suggest reopening this issue.

The bug is actually specific to the PostgreSQL provider. Other providers I tested (MySql (the Pomelo provider), Sqlite, and SqlServer) handle the combination of HasData(...) and auto-increment columns just fine. And, the problem happens when using EF migrations or DbContext.EnsureCreated() but, again, only with the PostgreSQL provider.

For my scenario (involving 4 linked tables), the stack trace was always

Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while updating the entries. See the inner exception for details. ---> Npgsql.PostgresException: 23505: duplicate key value violates unique constraint "PK_Categories"
   at Npgsql.NpgsqlConnector.<>c__DisplayClass161_0.<<ReadMessage>g__ReadMessageLong|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Npgsql.NpgsqlConnector.<>c__DisplayClass161_0.<<ReadMessage>g__ReadMessageLong|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.ExecuteAsync(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(DbContext _, ValueTuple`2 parameters, CancellationToken cancellationToken)
   at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IReadOnlyList`1 entriesToSave, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
   at BasicApi.Controllers.PetController.AddPet(Pet pet) in /tmp/BenchmarksServer/jxwpd31e.tcj/mvc/benchmarkapps/BasicApi/Controllers/PetController.cs:line 98
   at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
   at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at BasicApi.Startup.<>c__DisplayClass6_1.<<Configure>b__3>d.MoveNext() in /tmp/BenchmarksServer/jxwpd31e.tcj/mvc/benchmarkapps/BasicApi/Startup.cs:line 151

The InnerException varied. Other possibilities include

 ---> Npgsql.PostgresException: 23505: duplicate key value violates unique constraint "PK_Pets"
 ---> Npgsql.PostgresException: 23505: duplicate key value violates unique constraint "PK_Images"
 ---> Npgsql.PostgresException: 23505: duplicate key value violates unique constraint "PK_Tags"
 ---> Npgsql.PostgresException: 08P01: bind message supplies 1 parameters, but prepared statement "_auto1" requires 2

Please find attached the SQL script equivalent to the PostgreSQL migration I'm currently doing.
BasicApi.PostgreSql.create.sql.txt

dougbu on 20 Jun 2018

👍4

FYI @ajcvickers @bricelam @divega

dougbu on 20 Jun 2018

In the coming week I'll probably be very busy and unable to look at this.

The fact that an error occurs only on PostgreSQL is interesting, but doesn't in itself demonstrate an Npgsql provider issue - databases differ between one another and what works in one doesn't necessarily work on another.

Looking at this again, it seems very reasonable for this unique constraint violation to occur: you're seeding data with ids, but the sequence driving your id column isn't aware of that... I'm not sure if the other providers have some sort of special updating of the sequence/identity value when a row is inserted with a value - maybe the efcore folks you cc'd can help understand why this issue isn't occurring in other databases.

At least until we get a better idea, try doing as I suggested and manually create your sequence, specifying your start value above the maximum seeded id value.

roji on 20 Jun 2018

Hard problem. Is there a way to specify the start value for the underlying sequence? Maybe apps should partition their key space into seed data keys and app data keys. (e.g. start the sequence at 100 and use 1-99 for seed data.)

@AndriySvyryd any thoughts? If there's a way to know an entity was inserted, maybe the sequence could be restarted with the max value as part of seeding.

bricelam on 20 Jun 2018

👍2

Is there a way to specify the start value for the underlying sequence?

@bricelam "classical" PostgreSQL autoincrement columns are called serial, which is just a short-hand for creating a default sequence and setting the column's default value expression to take values from it (see these docs for more info).

Since PostgreSQL 10 there's also the SQL standard identity feature which is better (and supported by the Npgsql provider), but in the end very similar (column backed by a sequence). However, it does support passing sequence options (such as the starting value). See GENERATED ... AS IDENTITY in the PostgreSQL CREATE TABLE docs. However, it's still not clear how/where on the model one would express the starting value so that the identity gets set up with it...

Maybe apps should partition their key space into seed data keys and app data keys. (e.g. start the sequence at 100 and use 1-99 for seed data.)

That's exactly what I proposed above: if you're seeding data with explicit values for identity columns, it's your responsibility to set up your sequence to avoid collision. However, I'm curious on why this issue doesn't occur in other database (according to @dougbu above) - how do SqlServer / Sqlite manage seeding data that includes explicit values for identity columns?

roji on 20 Jun 2018

On SQL Server, we use SET IDENTITY_INSERT TheTable ON which basically puts a lock on the table and ensures the internal sequence is updated before releasing.

SQLite just lazily computes the next rowid based on the current highest value. (with some extra bookkeeping to ensure keys aren't reused when using AUTOINCREMENT)

bricelam on 20 Jun 2018

it's still not clear how/where on the model one would express the starting value

I think an API like this would make sense.

modelBuilder.Entity<MyEntity>().Property(e => e.MyIdentityProperty)
    .ForNpgsqlIdentityStartsAt(100);

This would just put an annotation on the IProperty that could flow down into the AddColumnOperation in Migrations.

bricelam on 20 Jun 2018

👍5

.ForNpgsqlIdentityStartsAt(100)

@bricelam I'm curious: Would it be EF or the provider that figures out 100 is enough for the HasData(...) additions?

SQLite just lazily computes the next rowid based on the current highest value. (with some extra bookkeeping to ensure keys aren't reused when using AUTOINCREMENT)

FYI given the similarity between the "apply migrations" SQL scripts I generated, I'd say MySql behaves similarly to SQLite.

dougbu on 20 Jun 2018

Would it be EF or the provider that figures out...

It's up to the application. If you specify 100, ensure all the keys in HasData() are below that.

bricelam on 20 Jun 2018

Since the key is a signed int I wonder whether the user could just use negative values for seed data

AndriySvyryd on 20 Jun 2018

👍5

@AndriySvyryd would we make that recommendation for EF providers and supposedly-provider-independent HasData(...) calls in general? That is, would it likely work for InMemory, SQLite, MySql, et cetera?

dougbu on 20 Jun 2018

@dougbu I think so, I'll add it to the PR updating data seeding docs https://github.com/aspnet/EntityFramework.Docs/pull/759/files#r194592252

AndriySvyryd on 20 Jun 2018

Awesome @AndriySvyryd

@roji if I update my app _today_ to use negative indices for identity columns in seed data, will this work with PostgreSQL i.e. the 2.1.0 version of the Npgsql.EntityFrameworkCore.PostgreSQL package?

dougbu on 20 Jun 2018

Since the key is a signed int I wonder whether the user could just use negative values for seed data

That's clever :)

@roji if I update my app today to use negative indices for identity columns in seed data, will this work with PostgreSQL i.e. the 2.1.0 version of the Npgsql.EntityFrameworkCore.PostgreSQL package?

I don't see why not... The default sequence created for serial (and new-style identity) columns starts at one and increments, so negative values should never conflict.

roji on 20 Jun 2018

Negative values for auto-increment columns in seeding data clears this problem for me when using the Npgsql.EntityFrameworkCore.PostgreSQL provider. It also does not break anything with the other providers I've been testing (Pomelo MySql, SQLite and SqlServer).

dougbu on 22 Jun 2018

👍3

Great, I think we can consider the matter closed, apart maybe for a mention in the EF Core docs...

roji on 22 Jun 2018

I wonder if ForNpgsqlUseIdentityAlwaysColumns (instead of ForNpgsqlUseIdentityByDefaultColumns) can be combined with HasData with negative indices. It should be doable with OVERRIDING SYSTEM VALUE, but how to instruct Npgsql to use it with HasData? It seems HasData should be considered as an exception in EF Core.

ljani on 7 Aug 2018

@ljani that's a very correct observation, I'd say that for the moment IdentityAlways is incompatible with EF Core data seeding. AFAIK data seeding just uses the regular update pipeline, so it's probably not possible (or a at least easy) to inject an OVERRIDING SYSTEM VALUE in there.

roji on 7 Aug 2018

Here is the SQL Server driver implementation for InsertDataOperation.

NpgsqlMigrationsSqlGenerator does not seem to handle InsertDataOperation.

I guess it should be at least doable. Should I create a new issue?

ljani on 7 Aug 2018

Oh I wasn't aware that there was a hook for this, thanks for digging into it! By all means please open a new issue for this and also feel free to submit a PR of you want to!

roji on 8 Aug 2018

Just want to add that not everyone wants to have negative Ids in their system, so I don't consider this resolved.

The workaround I ended up using was manually adding insert statements to the initial migration file. Bummer to abandon the HasData object format but at least this works.

rexforab on 7 Feb 2019

Just want to add that not everyone wants to have negative Ids in their system, so I don't consider this resolved.

Can you provide any reason why someone wouldn't want negative IDs? I've heard this once or twice but nobody's actually been able to explain why...

roji on 8 Feb 2019

Business requirement. For example, when you show it on an UI or in a site URL.

YohDeadfall on 8 Feb 2019

Another option here is to set up the sequence manually (see the docs), and to have it start at some number, reserving everything underneath to seeding. The only disadvantage here is the manual sequence setup.

@YohDeadfall exposing an internal numeric ID in a UI / URL is in general considered a big no-no - it means exposing some pretty internal data out (and can be used to find out how many users you have, etc.).

But the important thing is that there are solutions/workarounds to this (negative IDs, manually-configured sequences), and solving the problem automatically within the provider is really non-trivial and possibly brittle (i.e. detecting new data, manually resetting the sequence to another value...). Because of that it doesn't make sense to me to do something on our side...

roji on 9 Feb 2019

👍3

@YohDeadfall exposing an internal numeric ID in a UI / URL is in general considered a big no-no

Unfortunately, shit happens. Raymond Chen writes about this every time, but now you have an answer for users why they shouldn't do this, and why we shouldn't take some actions on our side.

But the important thing is that there are solutions/workarounds to this (negative IDs, manually-configured sequences)...

Agreed.

YohDeadfall on 9 Feb 2019

👍1

I am currently running into the this problem.

Unfortunately, I have to seed data from an existing system where I have to keep the ids exactly a s they are.

The only way to deal with this for me would be to use manually configured sequences. But that's not a very attractive solution as i would have to check the amount of seeded data and then configure the system according to it for every single table.

In the meantime, is there a real fix for this problem?

Edit: its a .NET Core 3.0.0 project with EF Core

mjehle82 on 29 Jan 2020

See this new section in the docs: http://www.npgsql.org/efcore/modeling/generated-properties.html#identity-sequence-options

roji on 29 Jan 2020

👍5

This only works if the seed data is known in advance.
When we ship our product to customers, they will initialise with their own seed data; some may have a few dozen rows, other may have tens of thousands.
It would be better there was an option to reset the sequence at run-time after the seed data has been loaded, rather than hard-coding it.

aetaylor on 16 Mar 2020

I've taken another look at this and implemented a solution... At the end of a migration, we detect all auto-generated columns of tables which were seeded in a migration, and reset their values to the current maximum.

As a workaround, people can manually restart the sequence at the number of their choosing by adding the following to OnModelCreating:

c# modelBuilder.Entity<Blog>().Property(b => b.Id).HasIdentityOptions(startValue: 20);

This will generate a migration that restarts the identity column at 20.

roji on 21 Mar 2020

👍2

Any update on this? Will @roji's code be merged back to develop anytime soon? I cannot use negative value as an ID and I don't really think that it should be accepted as a "solution" to this problem.