Efcore.pg: Support ForNpgsqlUseIdentityAlwaysColumns with HasData

Created on 8 Aug 2018  路  17Comments  路  Source: npgsql/efcore.pg

Currently Entity Framework Core's data seeding feature HasData does not support ForNpgsqlUseIdentityAlwaysColumns, because GENERATED ALWAYS AS IDENTITY does not allow inserting values in the identity column if OVERRIDING SYSTEM VALUE is not specified in the INSERT clause.

This could be supported by adding InsertDataOperation hook to NpgsqlMigrationsSqlGenerator. For inspiration, here's the SqlServerMigrationsSqlGenerator implementation using SET IDENTITY_INSERT.

See also the earlier discussion.

bug

Most helpful comment

Done, seeding now works with IDENTITY ALWAYS. However, you still need to use negative values to avoid conflicts.

All 17 comments

Thanks. Do you intend to submit a PR for this?

Sorry, no, not at this time at least.

No problem, I'll try to get this in soon.

Btw, how do you see, should this also fix the next value for sequences similar to other implementations, so we would not have to rely on negative indexes?

@ljani I'll have a look, but I don't think this will help the sequence conflict issue. In PostgreSQL, a serial column is simply a column with a default value coming from a sequence. If the application provides a value (like seeding does), there's no way AFAIK of making the sequence "aware" of that. The comment you linked to for SQL Server is about identity columns, which are a specific feature of that database which resembles the new PostgreSQL identity rather than serial columns.

It may be possible to check, after applying migrations, whether data has been seeded, and to manually update the sequence to be above the maximum value in the database - but I'm really not sure how easy/practical that would be.

Coming to think of it, I'm not sure that the negative indices issue won't exist even with PostgreSQL identity columns... Even if we tack OVERRIDING SYSTEM VALUE on the insert statement, I don't know if PostgreSQL actually updates the identity column's behind-the-scenes sequence or not. This will need to be tested.

Done, seeding now works with IDENTITY ALWAYS. However, you still need to use negative values to avoid conflicts.

@roji so what's the current fix? Always explicitly set Id to 0 or negative value?

@Sigvaard that's right - seed with negative values to make sure seeded values don't collide with values later generate from the identity sequence.

It would be wise to document it somewhere then

You're right - I'll add a note in http://www.npgsql.org/efcore/value-generation.html

Still, I wonder, isn't it possible to fix the issue without any workaround like negative indexes?

It works when changing a sequence to go from e.g. 100. but it crashes too when you do this on an already existing database with migrations.. one workaround is to add:

migrationBuilder.Sql("ALTER SEQUENCE public.\"XXX_Id_seq\" restart with 1000;");
manually to migration.

Setting indices to negative values also make them appear as negative in the database which isn't that nice.

I think this still is a bug because the same thing works with other EF core providers like MS SQL etc. It's like the sequence is not incremented when seeding with HasData...

I think the better option to seed is the old fashioned way of seeding without migrations (no HasData involved) see https://github.com/aspnet/EntityFramework.Docs/issues/1055


https://stackoverflow.com/questions/10114779/postgresql-how-to-set-up-manually-the-next-value-of-auto-increment

It would be helpful if migrationBuilder.InsertData (InsertDataOperation) incremented this, at the same time fixing the problem?

I think appending this
SELECT setval('id_seq', max(id)) FROM table;
in sql generator here: https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/blob/8e97e4195b197ae3d16763704352acfffa95c73f/src/EFCore.PG/Update/Internal/NpgsqlUpdateSqlGenerator.cs#L24-L49 would be a fix ?

https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/blob/cae91f3dad95638ae2078fc35ffdfa793e7210cc/src/EFCore.PG/Migrations/NpgsqlMigrationsSqlGenerator.cs#L871-L894

This seems to be a general issue with Postgres: https://stackoverflow.com/questions/9108833/postgres-autoincrement-not-updated-on-explicit-id-inserts

@Sigvaard as you correctly noted, this is more of a PostgreSQL question than an Npgsql question - PostgreSQL simply doesn't automatically update sequences when you manually insert into columns managed by them.

I personally think that the negative values workaround is good - there's really no concrete disadvantage unless you consider something wrong esthetically with negative IDs.

In theory, whenever we scaffold a migration that adds seed data, we could also restart the sequence at value x, where x is the max of the current values in the column in the database and the max value to be seeded. While this may work, it presents some challenges:

  • ALTER SEQUENCE <seqname> RESTART WITH x doesn't accept an expression for x, only a literal. This makes it impossible to make sure the new sequence is value is bigger than any existing value in the table.
  • The setval() function does allow expressions, but is non-transactional, so rolling back the migrations transaction (because of a later failure) will not roll back this change. This may not be a huge problem since the new value is guaranteed to be bigger than the current one anyway.

So this can be done, and I do admit that people have been running into this issue - it would be nice if people could just seed their IDs as they see fit and forget about negative values. @austindrenski @YohDeadfall what do you think?

So this can be done, and I do admit that people have been running into this issue - it would be nice if people could just seed their IDs as they see fit and forget about negative values.

I agree, it would be more intuitive if negative values were not required.

ALTER SEQUENCE RESTART WITH x doesn't accept an expression for x, only a literal. This makes it impossible to make sure the new sequence is value is bigger than any existing value in the table.

Could we use this for initial migrations? The idea being that if we detect that a migration is being generated from something like context.Database.EnsureCreated(), we use the transacted version (since we can count the seeded values), otherwise we fallback to the setval() version?

Agreed with @austindrenski. If you display a negative id somewhere, it should be without a minus sign which could be done using some dirty magic.

Could we use this for initial migrations? The idea being that if we detect that a migration is being generated from something like context.Database.EnsureCreated(), we use the transacted version (since we can count the seeded values), otherwise we fallback to the setval() version?

Although this is possible, what would be the value of using ALTER SEQUENCE only in the case of initial migrations, and setval() otherwise? I think it's preferably to choose one technique and stick with it rather than having two, and then we open the door to behavioral changes between an initial migration and a subsequent one...

I also don't really see a difference between an initial migration and a subsequent one in terms of either transactionality or the ability to count the seeded values - both migrations are executed in a transaction (and expected to roll back cleanly on failure), and both can look at the model and find the maximal seeded value for a column (at least I think so...).

Agreed with @austindrenski. If you display a negative id somewhere, it should be without a minus sign which could be done using some dirty magic.

I'm not sure I understand... The idea is to stop forcing users to seed negative values by making sure the sequence is always bigger than the larger value...

Although this is possible, what would be the value of using ALTER SEQUENCE only in the case of initial migrations, and setval() otherwise? I think it's preferably to choose one technique and stick with it rather than having two, and then we open the door to behavioral changes between an initial migration and a subsequent one...

I also don't really see a difference between an initial migration and a subsequent one in terms of either transactionality or the ability to count the seeded values - both migrations are executed in a transaction (and expected to roll back cleanly on failure), and both can look at the model and find the maximal seeded value for a column (at least I think so...).

Perhaps I need to take a closer look at the relevant source... My thought process was this: _if the transactability is desirable, use the version that respects the transaction where possible_.

But that might not actually make much sense in this context. If so, disregard.

I think that transactability is indeed desirable for all migrations - initial or subsequent. However, in this very specific case it may be OK to not roll back the sequence, since it could only have been increased, so no harm was done - at worst there will be a gap in the sequence values.

But we definitely need to think about this some more, and remember that there's a well-working workaround (negative values) - so I definitely wouldn't go into anything complex or risky here.

Was this page helpful?
0 / 5 - 0 ratings