Core: Suboptimal count query on entity list endpoint

Created on 22 Feb 2019  路  6Comments  路  Source: api-platform/core

Imagine a simple post table that contains a couple of columns in addition to a unique primary key id column.

When retrieving a list of posts, the following query is issued:

SELECT count(DISTINCT p0_.id) AS sclr_0 FROM post p0_

Where the distinct part is fairly expensive, but also redundant since the PK it's counting is already unique:

 explain analyze select count(distinct(id)) from post;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1548.00..1548.01 rows=1 width=8) (actual time=6.833..6.833 rows=1 loops=1)
   ->  Seq Scan on post  (cost=0.00..1523.00 rows=10000 width=4) (actual time=0.012..3.407 rows=10000 loops=1)
 Planning Time: 0.115 ms
 Execution Time: 6.875 ms
(4 rows)

If we run the query however without the DISTINCT, which on a unique PK is redundant:

explain analyze select count(*) from post;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1548.00..1548.01 rows=1 width=8) (actual time=5.029..5.030 rows=1 loops=1)
   ->  Seq Scan on post  (cost=0.00..1523.00 rows=10000 width=0) (actual time=0.013..3.523 rows=10000 loops=1)
 Planning Time: 0.152 ms
 Execution Time: 5.075 ms
(4 rows)

I've actually averaged around 10 samples into these values above as the explain info is the same, but varies only on the timings. The difference is pretty substantial, on the region of around 25%, and perhaps might be on something psql does post-query to DISTINCT values.

The table size is only 10K records, in postgres.

I believe this query can be optimised in API platform in a scenario where the primary key is a single column, which I would wager is rather common.

enhancement

Most helpful comment

the distinct is done incase there are joins on the count query IIRC

All 6 comments

this call comes from the doctrine paginator IIRC, try using the partial pagination? Although we shouldn't have to paginate on a post request?

the distinct is done incase there are joins on the count query IIRC

The DISTINCT is needed when there are joined relations, otherwise you will have the wrong value from COUNT. Perhaps we could set this query hint to false when there are no relations:

https://github.com/doctrine/orm/blob/v2.6.3/lib/Doctrine/ORM/Tools/Pagination/CountWalker.php#L42

I dare say this sort of query is common enough to get that little extra optimisation on it.

Yes, DISTINCT is very expensive, so it'd be great if we could do this.

Superb :+1:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kate-kate picture kate-kate  路  3Comments

theshaunwalker picture theshaunwalker  路  3Comments

Tjeerd picture Tjeerd  路  3Comments

CvekCoding picture CvekCoding  路  3Comments

vViktorPL picture vViktorPL  路  3Comments