Hello in use, there are 10 million pieces of data in the database. When the execution offset is 9 million, the limit is 10, the SQL query is very slow, and the query time can reach more than 3,000 milliseconds after analysis.
Finally, the reason is found out because too many to_json functions are used in SQL to reduce the query performance of Postgresql.
Excuse me, is there a solution to this phenomenon?
Problematic SQL:
explain analyze select to_json((__local_2__."id")) as "id", to_json((__local_2__."shop_id")) as "shopId", to_json((__local_2__."name")) as "name", to_json((__local_2__."published")) as "published", to_json((__local_2__."body_html")) as "bodyHtml", to_json((__local_2__."product_type")) as "productType", to_json((__local_2__."created_at")) as "createdAt", to_json((__local_2__."updated_at")) as "updatedAt"
from "public"."products" as __local_2__
where (TRUE) and (TRUE) and (TRUE)
order by __local_2__."id" ASC
OFFSET 11500610
limit 10
Analysis result:
Limit (cost=636146.05..636146.61 rows=10 width=260) (actual time=31879.550..31879.568 rows=7 loops=1)
-> Index Scan using products_pkey on products __local_2__ (cost=0.43..636148.05 rows=11500646 width=260) (actual time=0.054..31343.420 rows=11500617 loops=1)
Planning time: 0.079 ms
Execution time: 31879.592 ms
Remove the to_json SQL:
explain analyze select __local_2__."id" as "id", __local_2__."shop_id" as "shopId", __local_2__."name" as "name", __local_2__."published" as "published", __local_2__."body_html" as "bodyHtml", __local_2__."product_type" as "productType", __local_2__."created_at" as "createdAt", __local_2__."updated_at" as "updatedAt"
from "public"."products" as __local_2__
where (TRUE) and (TRUE) and (TRUE)
order by __local_2__."id" ASC
OFFSET 11500610
limit 10
Analysis result:
Limit (cost=406133.85..406134.21 rows=10 width=553) (actual time=3453.729..3453.732 rows=7 loops=1)
-> Index Scan using products_pkey on products __local_2__ (cost=0.43..406135.12 rows=11500646 width=553) (actual time=0.033..2948.304 rows=11500617 loops=1)
Planning time: 0.076 ms
Execution time: 3453.750 ms
Removing some of the to_json is an optimisation that鈥檚 on my todo list; we can remove it from very simple types like int, string; but we cannot remove it from most other types otherwise nested queries will result in different data being returned when the field is nested vs when it is root level.
I strongly suggest you move to cursor-based pagination rather than limit/offset; using cursor pagination enables PostgreSQL to use its indexes more efficiently resulting in significantly improved performance. In this case I would expect the query to be on the order of tens of milliseconds with cursor pagination.
You could also try upgrading PostgreSQL.
Also have you benchmarked this fairly? I.e. have you ran the first query again after the second query? PostgreSQL has probably cached things from the first execution which make the second execution faster, so it鈥檚 unlikely the tests are fair unless you interleve them.
(I ask because the query plans are the same but the estimated costs differ.)
Excellent question -- I was going to ask the exact same question. The benchmark at first glance does not seem fair. I would suggest the following:
Stop Postgres ... Start it:
Run the same queries 4 times ..
1 == With to_json
2 == Without to_json
3 == With to_json
4 == Without to_json
THEN -- Stop / Start Postgres again and run
1 == WITHOUT to_json
2 == With to_json
3 == WITHOUT
4 == WIth
And lets compare everything... ??
@benjie
Thank you for your reply
Both queries are fair, as described by @sjmcdowall , so there is no caching.
The version of Postgresql currently in use is 10.5.
I look forward to your early completion of the optimization, and once again thanks for providing a powerful postgraphile.
I am Chinese, English is not very good, thanks for patient reading.
So having looked into this more, it seems to be related to how the Postgres query planner works currently. It's possible that future versions of PostgreSQL will resolve these issues. In the mean time it's recommended that you solve this by using cursor based pagination which allows PostgreSQL to utilise its indexes efficiently - using limit/offset pagination is expensive in general.
I'm closing this for now because it will involve a large amount of work for quite an edge-case gain. If you'd like to sponsor this work, please get in touch.
This is fixed in v4.4.
create table products (
id serial,
shop_id int not null default floor(random() * 10000000),
name text not null,
published boolean not null default true,
body_html text default '<html><head><title>Foo</title></head><body>Bar</body></html>',
product_type text default 'fork',
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
insert into products (name) select i::text from generate_series(1, 10000000) i;
{
allProducts(first: 10, offset: 9000000) {
edges {
node {
id
shopId
name
bodyHtml
published
productType
createdAt
updatedAt
}
}
}
}
4.0.1 took 33.19s:
npx [email protected] -c issue-845 -p 5111
4.4.0-beta.5 took 988ms:
npx postgraphile@next -c issue-845 -p 5112
Most helpful comment
This is fixed in v4.4.
4.0.1took 33.19s:4.4.0-beta.5took 988ms: