Cockroach: sql: Performance problems with queries to pg_catalog

Created on 30 Nov 2020 · 8Comments · Source: cockroachdb/cockroach

Describe the problem

Hello. I migrated from CockroachDB 20.1.8 up to 20.2.2 and got problems with application start. Our ORM, PostgreSQL JDBC, creates on start queries to pg_catalog to determine types and relations. And this queries become very slow. We continued investigation and found, that responce time correlated with number of tables in DB. Also, I found that one node in cluster works well.

To Reproduce

Set up CockroachDB cluster in k8s.
Create database with 10 tables
Got this responce time I201130 10:52:27.993894 5920071 sql/exec_log.go:225 ⋮ [n3,client=‹10.111.7.3:38470›,hostssl,user=‹db1›] 3 112.396ms ‹exec› ‹"PostgreSQL JDBC Driver"› ‹{}› ‹"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"› ‹{$1:"'jsonb'"}› 1 ‹""› 0 ‹{ LATENCY_THRESHOLD }›
Add other database with 93 tables
Got responce time I201130 10:36:00.786376 5085793 sql/exec_log.go:225 ⋮ [n2,client=‹192.168.114.18:21850›,hostssl,user=‹db1›] 67 520.064ms ‹exec› ‹"PostgreSQL JDBC Driver"› ‹{}› ‹"SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC"› ‹{$1:"'jsonb'"}› 1 ‹""› 0 ‹{ LATENCY_THRESHOLD }›
Also we created more tables and query became slower.

Expected behavior

I expected better query time for catalog queries.

Environment:

CockroachDB 20.2.2
Kubernetes 1.16.15
Ubuntu 18.04.5
Linux 5.3.0-51-generic

C-bug O-community X-blathers-triaged branch-release-20.2 release-blocker

Source

gyrter

👍1

Most helpful comment

Thanks for these steps. I can reproduce more of the slowness now. I think the existence of more foreign keys makes these queries slower, so using the pgdump you provided was helpful.

I believe the underlying cause is new validation logic in v20.2.x that is performed when fetching table descriptors. We fetch these descriptors every time any data from a pg_catalog table is read.

We'll work on improving the performance and include the improvement in a patch release of 20.2 as soon as we can.

rafiss on 3 Dec 2020

❤2 👍2

All 8 comments

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@rafiss (found keywords: ORM,ORM, found keywords: JOIN,pg_)
@solongordon (found keywords: JOIN,pg_)
@joshimhoff (found keywords: Kubernetes)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl[bot] on 30 Nov 2020

Thanks for the report!

I haven't been able to reproduce this locally. I created a local single-node v20.2.2 cluster with 10 tables, then added 100 more tables in a different database. In both cases, the query took under 20ms.

root@:26257/defaultdb> SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;
  is_array | typtype | typname
-----------+---------+----------
   false   | b       | jsonb
(1 row)

Time: 17ms total (execution 17ms / network 0ms)

A couple questions for you @gyrter

Can you confirm that you saw this with v20.2.2? versions 20.2.0 and 20.2.1 did have an issue with performance of pg_catalog queries, but PR #56349 made it into v20.2.2 and addressed that issue. So I just want to confirm that you are running a version that includes that PR.
I've only tried running the query from the command line. Can you reproduce the problem by running the problem from the command line? Or does it only happen when running from the Java app, which I see also uses a prepared statement.
Can you run the following and share the output?

EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;

rafiss on 1 Dec 2020

❤1

Thank you for fast responce. I will try to help you with this issue.

I tested JDBC on 20.2.0, 20.2.1 and 20.2.2 - all versions was affected.
Yes, of cause. I found more interesting query
SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position.
Here is my tests:

~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f problem_query.sql
...
real    0m9.395s
user    0m0.056s
sys 0m0.009s
~ # time psql -U admin -h v20_1_8  -d db1 -p26257 -f problem_query.sql
...
real    0m1.437s
user    0m0.053s
sys 0m0.014s

Also, in different versions of CockroachDB this query has different query plans:

v20.2.2

                            tree                            |    field     |                 description                  
------------------------------------------------------------+--------------+----------------------------------------------
                                                            | distribution | local
                                                            | vectorized   | false
 sort                                                       |              | 
  │                                                         | order        | +non_unique,+type,+relname,+ordinal_position
  └── render                                                |              | 
       └── render                                           |              | 
            └── project set                                 |              | 
                 └── hash join                              |              | 
                      │                                     | equality     | (relam) = (oid)
                      ├── hash join                         |              | 
                      │    │                                | equality     | (oid) = (indexrelid)
                      │    ├── virtual table                |              | 
                      │    │                                | table        | pg_class@primary
                      │    └── hash join                    |              | 
                      │         │                           | equality     | (indrelid) = (oid)
                      │         ├── virtual table           |              | 
                      │         │                           | table        | pg_index@primary
                      │         └── hash join               |              | 
                      │              │                      | equality     | (relnamespace) = (oid)
                      │              ├── filter             |              | 
                      │              │    │                 | filter       | relname = 'sometable'
                      │              │    └── virtual table |              | 
                      │              │                      | table        | pg_class@primary
                      │              └── filter             |              | 
                      │                   │                 | filter       | nspname = 'public'
                      │                   └── virtual table |              | 
                      │                                     | table        | pg_namespace@primary
                      └── virtual table                     |              | 
                                                            | table        | pg_am@primary

v20.1.8

                              tree                               |    field    |                 description                  
-----------------------------------------------------------------+-------------+----------------------------------------------
                                                                 | distributed | false
                                                                 | vectorized  | false
 render                                                          |             | 
  └── sort                                                       |             | 
       │                                                         | order       | +non_unique,+type,+relname,+ordinal_position
       └── render                                                |             | 
            └── render                                           |             | 
                 └── project set                                 |             | 
                      └── hash-join                              |             | 
                           │                                     | type        | inner
                           │                                     | equality    | (oid) = (relam)
                           ├── virtual table                     |             | 
                           │                                     | source      | 
                           └── hash-join                         |             | 
                                │                                | type        | inner
                                │                                | equality    | (oid) = (indexrelid)
                                ├── virtual table                |             | 
                                │                                | source      | 
                                └── hash-join                    |             | 
                                     │                           | type        | inner
                                     │                           | equality    | (indrelid) = (oid)
                                     ├── virtual table           |             | 
                                     │                           | source      | 
                                     └── hash-join               |             | 
                                          │                      | type        | inner
                                          │                      | equality    | (relnamespace) = (oid)
                                          ├── filter             |             | 
                                          │    │                 | filter      | relname = 'sometable'
                                          │    └── virtual table |             | 
                                          │                      | source      | 
                                          └── filter             |             | 
                                               │                 | filter      | nspname = 'public'
                                               └── virtual table |             | 
                                                                 | source      |

Main difference - query has join on pg_am.

Output for EXPLAIN SELECT typinput = 'array_in'::REGPROC AS is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (SELECT ns.oid AS nspoid, ns.nspname, r.r FROM pg_namespace AS ns JOIN (SELECT s.r, (current_schemas(false))[s.r] AS nspname FROM ROWS FROM (generate_series(1, array_upper(current_schemas(false), 1))) AS s (r)) AS r USING (nspname)) AS sp ON sp.nspoid = typnamespace WHERE typname = 'jsonb' ORDER BY sp.r, pg_type.oid DESC;

~ # time psql -U admin -h v20_2_2 -d db1 -p26257 -f test_query.sql 
Password for user admin: 
                  tree                  |    field     |      description       
----------------------------------------pg_am+--------------+------------------------
                                        | distribution | local
                                        | vectorized   | false
 sort                                   |              | 
  │                                     | order        | +generate_series,-oid
  └── render                            |              | 
       └── hash join (right outer)      |              | 
            │                           | equality     | (oid) = (typnamespace)
            ├── hash join               |              | 
            │    │                      | equality     | (nspname) = (nspname)
            │    ├── virtual table      |              | 
            │    │                      | table        | pg_namespace@primary
            │    └── render             |              | 
            │         └── project set   |              | 
            │              └── emptyrow |              | 
            └── filter                  |              | 
                 │                      | filter       | typname = 'jsonb'
                 └── virtual table      |              | 
                                        | table        | pg_type@primary
(18 rows)


real    0m5.982s
user    0m0.052s
sys 0m0.016s
 ~ # time psql -U admin -h v20_1_8 -d db1 -p26257 -f test_query.sql 
Password for user admin: 
                    tree                     |    field    |      description       
---------------------------------------------+-------------+------------------------
                                             | distributed | false
                                             | vectorized  | false
 render                                      |             | 
  └── sort                                   |             | 
       │                                     | order       | +generate_series,-oid
       └── render                            |             | 
            └── hash-join                    |             | 
                 │                           | type        | right outer
                 │                           | equality    | (oid) = (typnamespace)
                 ├── hash-join               |             | 
                 │    │                      | type        | inner
                 │    │                      | equality    | (nspname) = (nspname)
                 │    ├── virtual table      |             | 
                 │    │                      | source      | 
                 │    └── render             |             | 
                 │         └── project set   |             | 
                 │              └── emptyrow |             | 
                 └── filter                  |             | 
                      │                      | filter      | typname = 'jsonb'
                      └── virtual table      |             | 
                                             | source      | 
(21 rows)


real    0m0.876s
user    0m0.052s
sys 0m0.018s

All db instances run in same nodes in k8s.

gyrter on 1 Dec 2020

Hi. We use hibernate too, and for schema validating I see this performance issue:
Screenshot 2020-12-01 at 11 58 26

Also I see this in cockroach cloud too. Add issue https://support.cockroachlabs.com/hc/en-us/requests/6880

Add diagnostics details.
stmt-bundle-611918818502377474.zip

z0mb1ek on 1 Dec 2020

👍1

I created a remote 3-node cluster running v20.2.2, and added 100 empty tables into it.

Running SELECT tmp.table_cat, tmp.table_schem, tmp.table_name, tmp.non_unique, tmp.index_qualifier, tmp.index_name, tmp.type, tmp.ordinal_position, btrim(pg_catalog.pg_get_indexdef(tmp.ci_oid, tmp.ordinal_position, false), '\"') AS column_name, CASE tmp.am_canorder WHEN true THEN CASE tmp.i_indoption[tmp.ordinal_position - 1] & 1 WHEN 1 THEN 'D' ELSE 'A' END ELSE NULL END AS asc_or_desc, tmp.cardinality, tmp.pages, tmp.filter_condition FROM (SELECT NULL AS table_cat, n.nspname AS table_schem, ct.relname AS table_name, NOT i.indisunique AS non_unique, NULL AS index_qualifier, ci.relname AS index_name, CASE i.indisclustered WHEN true THEN 1 ELSE CASE am.amname WHEN 'hash' THEN 2 ELSE 3 END END AS type, (information_schema._pg_expandarray(i.indkey)).n AS ordinal_position, ci.reltuples AS cardinality, ci.relpages AS pages, pg_catalog.pg_get_expr(i.indpred, i.indrelid) AS filter_condition, ci.oid AS ci_oid, i.indoption AS i_indoption, am.amcanorder AS am_canorder FROM pg_catalog.pg_class AS ct JOIN pg_catalog.pg_namespace AS n ON (ct.relnamespace = n.oid) JOIN pg_catalog.pg_index AS i ON (ct.oid = i.indrelid) JOIN pg_catalog.pg_class AS ci ON (ci.oid = i.indexrelid) JOIN pg_catalog.pg_am AS am ON (ci.relam = am.oid) WHERE (true AND (n.nspname = 'public')) AND (ct.relname = 'sometable')) AS tmp ORDER BY non_unique, type, index_name, ordinal_position

against it still only takes 45ms. So unfortunately I still have not been able to reproduce the slowness.

Thanks for sharing those explain plans. I think the plans actually are the same, though maybe the output format changed. I see both include the join on relam:

                      └── hash-join                              |             | 
                           │                                     | type        | inner
                           │                                     | equality    | (oid) = (relam)

the statement bundle you provided should be helpful. we will look into that next.

rafiss on 1 Dec 2020

@rafiss Hi, steps for reproduce:

create database test1
make IMPORT PGDUMP 'https://drive.google.com/uc?export=download&id=1USsFE_LypdaAMCRNgExaiL5L4wvLmbP4'
create database test2
start java application that creates simple 10 tables (I think it doesn't matter by hand or automatically)
restart java app and see that it starts for 1 minute
see in cockroachdb cluster panel that service latency for this start is 6s
see in cockroachdb cluster panel braking statements

Also I see slow working in TablePlus too

Add more bundles for you

stmt-bundle-612213293907738625.zip
stmt-bundle-612213287988232193.zip
stmt-bundle-612212071640268801.zip
stmt-bundle-612212065837842433.zip
stmt-bundle-612212028882747393.zip
stmt-bundle-612212014046117889.zip
stmt-bundle-612212034723119105.zip
stmt-bundle-612212020206993409.zip

z0mb1ek on 2 Dec 2020

Thanks for these steps. I can reproduce more of the slowness now. I think the existence of more foreign keys makes these queries slower, so using the pgdump you provided was helpful.

I believe the underlying cause is new validation logic in v20.2.x that is performed when fetching table descriptors. We fetch these descriptors every time any data from a pg_catalog table is read.

We'll work on improving the performance and include the improvement in a patch release of 20.2 as soon as we can.

rafiss on 3 Dec 2020

❤2 👍2

Hi @jordanlewis, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl[bot] on 3 Dec 2020

Was this page helpful?

1 / 5 - 1 ratings

Related issues

geo/geomfn: implement ST_ClusterWithin({geometry,float8})

otan · 4Comments

cdc,sql: core changefeeds don't work as expected with `vectorized=on`

ajwerner · 4Comments

cli/sql: SIGSEGV when the TERM env var is not set

HeikoOnnebrink · 4Comments

sql: update age to duration, txn to transaction

awoods187 · 3Comments

Foreign Key Constriant Not Enforced

xudongzheng · 3Comments