Imagine a simple post table that contains a couple of columns in addition to a unique primary key id column.
When retrieving a list of posts, the following query is issued:
SELECT count(DISTINCT p0_.id) AS sclr_0 FROM post p0_
Where the distinct part is fairly expensive, but also redundant since the PK it's counting is already unique:
explain analyze select count(distinct(id)) from post;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=1548.00..1548.01 rows=1 width=8) (actual time=6.833..6.833 rows=1 loops=1)
-> Seq Scan on post (cost=0.00..1523.00 rows=10000 width=4) (actual time=0.012..3.407 rows=10000 loops=1)
Planning Time: 0.115 ms
Execution Time: 6.875 ms
(4 rows)
If we run the query however without the DISTINCT, which on a unique PK is redundant:
explain analyze select count(*) from post;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=1548.00..1548.01 rows=1 width=8) (actual time=5.029..5.030 rows=1 loops=1)
-> Seq Scan on post (cost=0.00..1523.00 rows=10000 width=0) (actual time=0.013..3.523 rows=10000 loops=1)
Planning Time: 0.152 ms
Execution Time: 5.075 ms
(4 rows)
I've actually averaged around 10 samples into these values above as the explain info is the same, but varies only on the timings. The difference is pretty substantial, on the region of around 25%, and perhaps might be on something psql does post-query to DISTINCT values.
The table size is only 10K records, in postgres.
I believe this query can be optimised in API platform in a scenario where the primary key is a single column, which I would wager is rather common.
this call comes from the doctrine paginator IIRC, try using the partial pagination? Although we shouldn't have to paginate on a post request?
the distinct is done incase there are joins on the count query IIRC
The DISTINCT is needed when there are joined relations, otherwise you will have the wrong value from COUNT. Perhaps we could set this query hint to false when there are no relations:
https://github.com/doctrine/orm/blob/v2.6.3/lib/Doctrine/ORM/Tools/Pagination/CountWalker.php#L42
I dare say this sort of query is common enough to get that little extra optimisation on it.
Yes, DISTINCT is very expensive, so it'd be great if we could do this.
Superb :+1:
Most helpful comment
the distinct is done incase there are joins on the count query IIRC