Efcore.pg: Unable to update when primary key is character(x)

Created on 3 Aug 2018  路  13Comments  路  Source: npgsql/efcore.pg

A DbUpdateConcurrencyException occurs when an UPDATE to a record with a fixed length character primary key is Saved to the DB and that key has trailing spaces. The Concurrency check appears to trim the value on the primary key which causes it to fail validation of the change even though the SQL log shows a valid UPDATE command (with space padding) was received and processed. I have attached a small .NET Core Console application that generates the exception.

I have tried numerous ways to define the entity model but without success. Prior to this 2.1.1.1 release of Npgsql.EntityFrameworkCore.PostgreSQL I was able to make this work using a property definition of .HasColumnType("varchar").HasMaxLength(8) which didn't make sense but worked. Re-scaffolding the database using the 2.1.1.1 release outputs the property as .HasColumnType("character(8)") with a string property.

Postgresql log:

2018-08-03 16:44:19 CDT LOG:  execute <unnamed>: UPDATE testchar.tablechar SET char_value = $1 WHERE char_ident = $2
2018-08-03 16:44:19 CDT DETAIL:  parameters: $1 = 'This will NOT update', $2 = '123456  '

testchar.zip

bug

All 13 comments

This program will execute if you change the package references to:

  PackageReference Include="Microsoft.EntityFrameworkCore" Version="2.0.2" 
  PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="2.0.1"

And update the model builder definition of CharIdent to be:

    .HasColumnType("varchar").HasMaxLength(8)

But I really need to update to .NET 2.1!

This behavior is actually defined in the SQL specification (1992):

3) The comparison of two character strings is determined as follows:

    a) If the length in characters of X is not equal to the length
        in characters of Y, then the shorter string is effectively
        replaced, for the purposes of comparison, with a copy of
        itself that has been extended to the length of the longer
        string by concatenation on the right of one or more pad char-
        acters, where the pad character is chosen based on CS. If
        CS has the NO PAD attribute, then the pad character is an
        implementation-dependent character different from any char-
        acter in the character set of X and Y that collates less
        than any string under CS. Otherwise, the pad character is a
        <space>.

    b) The result of the comparison of X and Y is given by the col-
        lating sequence CS.

    c) Depending on the collating sequence, two strings may com-
        pare as equal even if they are of different lengths or con-
        tain different sequences of characters. When the operations
        MAX, MIN, DISTINCT, references to a grouping column, and the
        UNION, EXCEPT, and INTERSECT operators refer to character
        strings, the specific value selected by these operations from
        a set of such equal values is implementation-dependent.

_Source credit to SQL: Where spaces may not matter._

@tom-mulligan I misread your description initially. Sorry about that. I've created a simplified version of your repro: repro here: https://github.com/austindrenski/EFCore.PG-567.

This could be related to the string comparisons I referenced above....but it's much more likely to be another instance of what we could call _the explicit cast and broken inference problem_.

As a short term solution, you could create the column as varchar(8) (or text + your own check constraint). Alternatively, you can take a look here for a solution that involves creating overloaded operators.

Borrowing a description of the problem from @rwasef1830:

I imagined there was a tsvector @@ text overload and implemented a Matches method that took a string parameter in addition to the overload that takes NpgsqlTsQuery. Strangely enough, the unit test for this was passing but I now I think that it was due to the test passing the string as a literal constant to the Matches method and Npgsql embedding the literal directly in the SQL thus allowing the PostgreSQL type coercion to work.

However, when it is passed as a parameter, Npgsql sends the parameter with explicit type of text which causes PostgreSQL to fail since it won't do type coercion when the types are specified explicitly in the query protocol for the parameters and there is no tsvector @@ text overload.

/cc @roji

--
-- Pretend that the LHS value is a column and the RHS value is a parameter:
--

-- this is the answer we want
SELECT '123456  '::char(8) = '123456  '::char(8);
 ?column?
----------
 t
(1 row)

-- but this is what we actually have
SELECT '123456  '::char(8) = '123456  '::text;
 ?column?
----------
 f
(1 row)

-- this would still fail from the lengths
SELECT '123456  '::char(8) = '123456  '::char;
 ?column?
----------
 f
(1 row)

-- works because varchar(8) is really text + a check constraint
SELECT '123456  '::varchar(8) = '123456  '::text;
 ?column?
----------
 t
(1 row)

-- works because of literal type inference
SELECT '123456  '::char(8) = '123456  ';
 ?column?
----------
 t
(1 row)

Related

https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/428
https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/561
https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/562
https://github.com/aspnet/EntityFrameworkCore/issues/4978

@austindrenski your analysis here seems correct, we keep running into this issue again and again.

One new thought I've just had on this, is that there's the theoretical possibility of relaxing the strong typing approach when it comes. Npgsql currently sends specific type OIDs for all parameters, but there's also the option to send 0 ("unknown"), to allow PostgreSQL to infer the type from the context. This would make string parameters behave like literal strings above, and allow PostgreSQL type inference to do the right thing. Note that in EF Core literal representations are currently also always strongly-typed, except for strings (and numbers) as can see above: we don't generate 'some_text'::TEXT, but rather simply some_text. This was more a concern of readability, but also affects inference.

I'm not sure I understand the exact consequences of not typing our parameters in this way. SQL where the context isn't enough to infer would start failing; an example is SELECT $1, but there are surely some less-contrived real-world scenarios. This definitely merits some analysis, we could also simply send type OID 0 and see what it does to our current test suite :) Finally, note that this is really something that's needed for EF Core (but possibly also other ORMs) rather than for Npgsql itself - if you're working at the ADO.NET level you always have the option of explicitly setting the exact desired type via NpgsqlDbType.

PS Here are the PostgreSQL docs on character types, documenting the bizarre truncation behavior with trailing whitespace.

Split off the general type OID discussion to https://github.com/npgsql/npgsql/issues/2095

The more I dig into the comparison semantics for text/varchar/char(n), the more bewildered fascinated I am.

While the OID discussion is good to have, I'm not entirely sure it would help here. It almost looks like this is just a case of ___extreme differences___ between how the CLR and PostgreSQL think about string comparisons and type resolution.

See below for a worked out example, but I'm concerned that even if we sent string parameters with OID = unknown, dealing with char(n) could still break down in ways that are difficult to anticipate.

Personally, I would like to see us strongly discourage the use of char(n)/character(n) in EF Core altogether (much like timestamptz or tstzrange).

Per the string type docs:

However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character.

Trailing spaces are removed when converting a character value to one of the other string types.

Per the string function docs:

Unless otherwise noted, all of the functions listed below work on all of these types, but be wary of potential effects of automatic space-padding when using the character type.

--
-- Pretend that the RHS value is a parameter:
--

CREATE TEMP TABLE test ( a char(8) );
INSERT INTO test (a) VALUES ('123456  ');

-- note the two extra dashes in the row separator which indicate trailing spaces are present
SELECT a FROM test;
    a
----------
 123456
(1 row)

-- not no extra dashes in the row separator which indicates truncation occured
SELECT a::text FROM test;
   a
--------
 123456
(1 row)

-- if we use OID = unknown
SELECT a = '123456  '::unknown FROM test;
 ?column?
----------
 t
(1 row)

-- but this is still what's really happening
 SELECT RTRIM(a) = RTRIM('123456  '::unknown) FROM test;
 ?column?
----------
 t
(1 row)

-- here's how the lengths work out
SELECT 
  char_length(a)                   AS a, 
  char_length('123456  ')          AS infer_type, 
  char_length('123456  '::text)    AS cast_text, 
  char_length('123456  '::char(8)) AS cast_char8, 
  char_length('123456  '::unknown) AS cast_unknown
FROM test;
 a | infer_type | cast_text | cast_char8 | cast_unknown
---+------------+-----------+------------+--------------
 6 |          8 |         8 |          6 |            8
(1 row)

Thank-you for this information. It identified what the problem is and a means to overcome it. The true cause is the logic change in Npgsql to now specify parameter data types instead of letting the database infer the type. Prior to 2.1 this was not happening for literal strings.

I understand that the project that identified this problem is working with a poor database design, but we are now forced to change a large legacy database from character(x) to varchar(x) for all the PK and FK columns that were character(x). This has complications because there are other legacy applications and external systems that utilize this database for content. The testing and approval process required for this change will take a large amount of resources and considerable time.

Yes this is a poor database design choice, but it had been working on various database servers for over 20 years until now. I also understand the decision of Npgsql to strongly (??? except string ???) type parameters instead of letting the database infer them (although, the database does a pretty good job when it does infer the data type...IMO).

@tom-mulligan wrote:

I understand that the project that identified this problem is working with a poor database design, but we are now forced to change a large legacy database from character(x) to varchar(x) for all the PK and FK columns that were character(x). This has complications because there are other legacy applications and external systems that utilize this database for content. The testing and approval process required for this change will take a large amount of resources and considerable time.

I am very sympathetic to your situation. I've just recently reworked a legacy schema and dealt with all of the headaches that come with updating an old system.

At the end of the day, an ORM is just an abstraction intended to help you add value to your organization. It shouldn't dictate business decisions, nor do we intend it to.

We can provide guidance on how this project will work with your database, but in the end, we can only offer advice that should be weighed against many other factors before applying it to real-world business decisions.

With all of that in mind, I do plan to take another look this weekend to see what鈥攊f anything鈥攃an be done in the EF Core provider to better support the char(n) semantics as defined by PostgreSQL.

@austindrenski wrote:

Personally, I would like to see us strongly discourage the use of char(n)/character(n) in EF Core altogether (much like timestamptz or tstzrange).

To be clear, this is _strictly my personal opinion_ for an idealistic world where every database is a greenfield project. PostgreSQL supports these data types because each has a role and adds value in some way.

This seems fringe enough to "let it go", though continue to consider this issue in future planning. Thanks again for the prompt and intelligent discourse. Consider this closed?

FWIW, I do think there's room to revisit the decision that literal representations be strongly-typed. This is somewhat related to the discussion in https://github.com/npgsql/npgsql/issues/2095, although that discussion is for parameters and not literal representations.

My primary objection, apart from a general preference of strong typing, is that if we remove strong typing we introduce a behavior discrepancy between parameters (which as per https://github.com/npgsql/npgsql/issues/2095 will stay strongly-typed) and literal values. I think it's better for these two types of things to behave in the same way.

There's also a possible solution on the way, namely https://github.com/aspnet/EntityFrameworkCore/issues/4978. At that point there would at least be a way to manually override the type chosen by EF Core, even if it would be somewhat verbose etc.

But we can definitely close for now, @austindrenski what do you think?

@roji I've taken another pass at this with #588 and I think I've got a working patch in place. But, the fix feels fragile, so I'd appreciate another set of eyes on it before calling this closed...

_edit: much less fragile after additional testing and review_

@tom-mulligan We've merged #588 to patch the comparison discrepancies reported here. This should make character data easier to use with the release of version 2.2.0 later this year.

If you would like to try it out sooner, have a look at our __unstable__ MyGet feed for the latest build:

dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL --version 2.2.0-ci.935 --source https://www.myget.org/F/npgsql-unstable/api/v3/index.json

This was a challenging bug, so many thanks for taking the time to report it to us!

The thanks are all to you and the Npgsql team for finding the issue and correcting it so quickly. Fortunately this problem allowed me to convince the customer that some database structure changes should be made to overcome other conversion from Sybase issues.

Was this page helpful?
0 / 5 - 0 ratings