Efcore.pg: COALESCE in array Contains prevents index usage

Created on 5 Dec 2019 · 11Comments · Source: npgsql/efcore.pg

Hello,
Today i updated to the latest version (3.1.0).

After the upgrade, I can see some very strange behavior in queries which leads to a sequence scan instead of index scan as previously.

Details:
Example with 3.1.0

SELECT o.id
FROM odds AS o
WHERE COALESCE(o.id = ANY ($1), FALSE)

(where $1 = '{1,2,3,4}')

That leads to sequence scan because of coalesce.

Example with 3.0.1

SELECT o.id
FROM odds AS o
WHERE o.id IN (1, 2, 3, 4, 5)

That's is the commit where that change was introduced https://github.com/npgsql/efcore.pg/commit/d7863b8a58cc2718ef7c91bbe449919c4eebaff9#diff-9091bf8c282a54bfeecb802bb7705561

cc @roji

Thanks,
Stas

breaking change performance

Source

skynet2

Most helpful comment

@pgrm hang in there, I really intend to release 3.1.1 in the next few days!

roji on 30 Jan 2020

👍3

All 11 comments

/cc @smitpatel @maumar on a case where null semantics degrades performance because an index isn't used. Am just curious whether we're aware of other cases like this in general (which aren't a result of superfluous checks we haven't yet optimized away).

roji on 6 Dec 2019

@skynet2 just to be sure, can you please post the full LINQ query? I'm specifically interested in whether your array is a constant (specified inline in the query, not as an external variable) and whether id is nullable.

FYI the COALESCE is necessary in order to preserve the correct behavior in the face of nulls - but I'll think about what we can do if this has a significant perf impact.

roji on 6 Dec 2019

Hi @roji

        public class Odd2
        {
            public long Id { get; set; }
        }
        public class TestContext : DbContext
        {
            public DbSet<Odd2> Odds { get; set; }

            public TestContext(DbContextOptions<TestContext> options) : base(options)
            {

            }

            protected override void OnModelCreating(ModelBuilder modelBuilder)
            {
                modelBuilder.Entity<Odd2>(opt =>
                {
                    opt.HasKey(x => x.Id);
                    opt.Property(x => x.Id).HasColumnName("id");
                    opt.ToTable("odds");
                });
            }
        }

        public static async Task Test()
        {
            var collection = new ServiceCollection();

            collection.AddDbContext<TestContext>(o =>
                {
                    o.UseNpgsql(
                        "conntection-string");
                });

            var provider = collection.BuildServiceProvider();

            var context = provider.GetService<TestContext>();
            var ids = new long[] {1, 2, 3, 4, 5};
            var items = context.Odds.Where(y => ids.Contains(y.Id)).ToArray();
        }

Thanks,
Stas

skynet2 on 6 Dec 2019

I am not able to understand the difference here. Both of the SQL relates to different tables and selecting different data. What is the expected SQL vs generated SQL? It may be slightly related to the fact that null semantics are not taken care for custom expression.

smitpatel on 6 Dec 2019

Yeah, the LINQ and SQL queries above don't seem to correspond to the SQL. @skynet2, in general, what we need are clear, minimal code samples which show issues, anything else is confusing and creates more work for us.

However, it's not surprising that adding the COALESCE causes PG to stop using the index on id:

```c#
DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT PRIMARY KEY);
INSERT INTO data (id) VALUES (1), (2), (3);

EXPLAIN SELECT * FROM data WHERE id = ANY('{1, 100, 1000}');
EXPLAIN SELECT * FROM data WHERE COALESCE(id = ANY('{1, 100, 1000}'));

The version without the COALESCE:

Bitmap Heap Scan on data (cost=8.49..15.60 rows=3 width=4)
Recheck Cond: (id = ANY ('{1,100,1000}'::integer[]))
-> Bitmap Index Scan on data_pkey (cost=0.00..8.49 rows=3 width=0)
Index Cond: (id = ANY ('{1,100,1000}'::integer[]))

The version with the COALESCE:

Seq Scan on data (cost=0.00..45.06 rows=1275 width=4)
Filter: COALESCE((id = ANY ('{1,100,1000}'::integer[])))
```

FWIW this is exactly the thing that made me open https://github.com/aspnet/EntityFrameworkCore/issues/17598 (this case is documented there); once we have a more flexible parameter sniffing architecture, we could look at array parameters at execution time and not generate the COALESCE if there's no null. Until that happens, I think it makes sense to not do null semantics here, i.e. remove the COALESCE for now (nobody really complained before I added it in 3.0). @smitpatel I know this is sacrificing correctness for perf but in this case it may make sense.

It may be slightly related to the fact that null semantics are not taken care for custom expression.

That would definitely be a better way of applying the null semantics (and also allowing opt-out when relational nulls are on), but it wouldn't solve the perf issue created by adding the COALESCE...

roji on 7 Dec 2019

Hi @roji @smitpatel ,
Yep, sorry for bad examples and description, I was really in hurry posting that issue.
I updated my replies with the proper SQL and c# code examples.

Thanks,
Stas

skynet2 on 11 Dec 2019

I am hitting this.
I removed COALESSE form outputted SQL and performance is back
Before I start rewriting all my linq queries how far away is the fix from nuget?

mikes-gh on 22 Jan 2020

In the meantime if anyone hits this you can someArray.ToList() to force an IN translation
In the following example invoiceNoStrings is an array.

 ICollection<Models.Transactions> invoiceTransactions = _tradingCompanyDbContext.Transactions
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.InvTypeNavigation)
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.PaymentFreqNavigation)
                .Include(t => t.InvoiceDetails).ThenInclude(i => i.StockNoNavigation)
                .Include(t => t.DiscountDetails)
                .Include(t => t.CommentDetails)
                .Where(t => invoiceNoStrings.ToList().Contains(t.CustSuppRef)
                    && (t.TransCode == 0 || t.TransCode == 1))
                .AsNoTracking()
                .ToList();

mikes-gh on 22 Jan 2020

@roji we'd be also interested in having this at least in a preview release - or is there anything that breaks by removing coalesce / what are the trade-offs (like as long as you don't use it with a nullable column)?

pgrm on 30 Jan 2020

@pgrm hang in there, I really intend to release 3.1.1 in the next few days!

roji on 30 Jan 2020

👍3

Shoot, seems like I forgot to cherry-pick this to 3.1.1 (just like #1154... not proud here). I will release 3.1.1.2 shortly.

Note that this is a small breaking change - anyone relying on the current coalescing behavior may be broken. However, this does seem like the right thing to do until we can bring back proper null semantics for 5.0 (#1142)

roji on 6 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings