Describe the problem
Hello. I migrated from CockroachDB 20.1.8 up to 20.2.2 and got problems with application start. Our ORM, PostgreSQL JDBC, creates on start queries to pg_catalog to determine types and relations. And this queries become very slow. We continued investigation and found, that responce time correlated with number of tables in DB. Also, I found that one node in cluster works well.
To Reproduce
I201130 10:52:27.993894 5920071 sql/exec_log.go:225 โฎ [n3,client=โน10.111.7.3:38470โบ,hostssl,user=โนdb1โบ] 3 112.396ms โนexecโบ โน"PostgreSQL JDBC Driver"โบ โน{}โบ โน"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"โบ โน{$1:"'jsonb'"}โบ 1 โน""โบ 0 โน{ LATENCY_THRESHOLD }โบI201130 10:36:00.786376 5085793 sql/exec_log.go:225 โฎ [n2,client=โน192.168.114.18:21850โบ,hostssl,user=โนdb1โบ] 67 520.064ms โนexecโบ โน"PostgreSQL JDBC Driver"โบ โน{}โบ โน"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"โบ โน{$1:"'jsonb'"}โบ 1 โน""โบ 0 โน{ LATENCY_THRESHOLD }โบExpected behavior
I expected better query time for catalog queries.
Environment:
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
Thanks for the report!
I haven't been able to reproduce this locally. I created a local single-node v20.2.2 cluster with 10 tables, then added 100 more tables in a different database. In both cases, the query took under 20ms.
root@:26257/defaultdb> SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;
is_array | typtype | typname
-----------+---------+----------
false | b | jsonb
(1 row)
Time: 17ms total (execution 17ms / network 0ms)
A couple questions for you @gyrter
EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;
Thank you for fast responce. I will try to help you with this issue.
SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position.~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f problem_query.sql
...
real 0m9.395s
user 0m0.056s
sys 0m0.009s
~ # time psql -U admin -h v20_1_8 -d db1 -p26257 -f problem_query.sql
...
real 0m1.437s
user 0m0.053s
sys 0m0.014s
Also, in different versions of CockroachDB this query has different query plans:
tree | field | description
------------------------------------------------------------+--------------+----------------------------------------------
| distribution | local
| vectorized | false
sort | |
โ | order | +non_unique,+type,+relname,+ordinal_position
โโโ render | |
โโโ render | |
โโโ project set | |
โโโ hash join | |
โ | equality | (relam) = (oid)
โโโ hash join | |
โ โ | equality | (oid) = (indexrelid)
โ โโโ virtual table | |
โ โ | table | pg_class@primary
โ โโโ hash join | |
โ โ | equality | (indrelid) = (oid)
โ โโโ virtual table | |
โ โ | table | pg_index@primary
โ โโโ hash join | |
โ โ | equality | (relnamespace) = (oid)
โ โโโ filter | |
โ โ โ | filter | relname = 'sometable'
โ โ โโโ virtual table | |
โ โ | table | pg_class@primary
โ โโโ filter | |
โ โ | filter | nspname = 'public'
โ โโโ virtual table | |
โ | table | pg_namespace@primary
โโโ virtual table | |
| table | pg_am@primary
tree | field | description
-----------------------------------------------------------------+-------------+----------------------------------------------
| distributed | false
| vectorized | false
render | |
โโโ sort | |
โ | order | +non_unique,+type,+relname,+ordinal_position
โโโ render | |
โโโ render | |
โโโ project set | |
โโโ hash-join | |
โ | type | inner
โ | equality | (oid) = (relam)
โโโ virtual table | |
โ | source |
โโโ hash-join | |
โ | type | inner
โ | equality | (oid) = (indexrelid)
โโโ virtual table | |
โ | source |
โโโ hash-join | |
โ | type | inner
โ | equality | (indrelid) = (oid)
โโโ virtual table | |
โ | source |
โโโ hash-join | |
โ | type | inner
โ | equality | (relnamespace) = (oid)
โโโ filter | |
โ โ | filter | relname = 'sometable'
โ โโโ virtual table | |
โ | source |
โโโ filter | |
โ | filter | nspname = 'public'
โโโ virtual table | |
| source |
Main difference - query has join on pg_am.
EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f test_query.sql
Password for user admin:
tree | field | description
----------------------------------------pg_am+--------------+------------------------
| distribution | local
| vectorized | false
sort | |
โ | order | +generate_series,-oid
โโโ render | |
โโโ hash join (right outer) | |
โ | equality | (oid) = (typnamespace)
โโโ hash join | |
โ โ | equality | (nspname) = (nspname)
โ โโโ virtual table | |
โ โ | table | pg_namespace@primary
โ โโโ render | |
โ โโโ project set | |
โ โโโ emptyrow | |
โโโ filter | |
โ | filter | typname = 'jsonb'
โโโ virtual table | |
| table | pg_type@primary
(18 rows)
real 0m5.982s
user 0m0.052s
sys 0m0.016s
~ # time psql -U admin -h v20_1_8 -d db1 -p26257 -f test_query.sql
Password for user admin:
tree | field | description
---------------------------------------------+-------------+------------------------
| distributed | false
| vectorized | false
render | |
โโโ sort | |
โ | order | +generate_series,-oid
โโโ render | |
โโโ hash-join | |
โ | type | right outer
โ | equality | (oid) = (typnamespace)
โโโ hash-join | |
โ โ | type | inner
โ โ | equality | (nspname) = (nspname)
โ โโโ virtual table | |
โ โ | source |
โ โโโ render | |
โ โโโ project set | |
โ โโโ emptyrow | |
โโโ filter | |
โ | filter | typname = 'jsonb'
โโโ virtual table | |
| source |
(21 rows)
real 0m0.876s
user 0m0.052s
sys 0m0.018s
All db instances run in same nodes in k8s.
Hi. We use hibernate too, and for schema validating I see this performance issue:

Also I see this in cockroach cloud too. Add issue https://support.cockroachlabs.com/hc/en-us/requests/6880
Add diagnostics details.
stmt-bundle-611918818502377474.zip
I created a remote 3-node cluster running v20.2.2, and added 100 empty tables into it.
Running SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position
against it still only takes 45ms. So unfortunately I still have not been able to reproduce the slowness.
Thanks for sharing those explain plans. I think the plans actually are the same, though maybe the output format changed. I see both include the join on relam:
โโโ hash-join | |
โ | type | inner
โ | equality | (oid) = (relam)
the statement bundle you provided should be helpful. we will look into that next.
@rafiss Hi, steps for reproduce:
IMPORT PGDUMP 'https://drive.google.com/uc?export=download&id=1USsFE_LypdaAMCRNgExaiL5L4wvLmbP4'Also I see slow working in TablePlus too
Add more bundles for you
stmt-bundle-612213293907738625.zip
stmt-bundle-612213287988232193.zip
stmt-bundle-612212071640268801.zip
stmt-bundle-612212065837842433.zip
stmt-bundle-612212028882747393.zip
stmt-bundle-612212014046117889.zip
stmt-bundle-612212034723119105.zip
stmt-bundle-612212020206993409.zip
Thanks for these steps. I can reproduce more of the slowness now. I think the existence of more foreign keys makes these queries slower, so using the pgdump you provided was helpful.
I believe the underlying cause is new validation logic in v20.2.x that is performed when fetching table descriptors. We fetch these descriptors every time any data from a pg_catalog table is read.
We'll work on improving the performance and include the improvement in a patch release of 20.2 as soon as we can.
Hi @jordanlewis, please add branch-* labels to identify which branch(es) this release-blocker affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
Most helpful comment
Thanks for these steps. I can reproduce more of the slowness now. I think the existence of more foreign keys makes these queries slower, so using the pgdump you provided was helpful.
I believe the underlying cause is new validation logic in v20.2.x that is performed when fetching table descriptors. We fetch these descriptors every time any data from a pg_catalog table is read.
We'll work on improving the performance and include the improvement in a patch release of 20.2 as soon as we can.