Yugabyte-db: [YSQL] SIGSEGV in GetMemoryChunkContext (YBRefreshCache) in idle connection

Created on 9 Nov 2020  路  4Comments  路  Source: yugabyte/yugabyte-db

Version: Application fingerprint: version 2.3.2.0 build 37 revision 34cbfe2075ed7068e774157c59eb492a07f3fa6f build_type RELEASE built at 25 Sep 2020 10:30:22 UTC

Encountered the following stack trace on an idle connection:

Core backtrace:

Core was generated by `postgres: parentalservice parental_microservice 240b:c0e0:101:5dc0:b494:2:0:978'.
Program terminated with signal 11, Segmentation fault.
#0  GetMemoryChunkContext (pointer=0x65736c6166206369) at ../../../../../../../src/postgres/src/include/utils/memutils.h:131
131 ../../../../../../../src/postgres/src/include/utils/memutils.h: No such file or directory.
(gdb) bt
#0  GetMemoryChunkContext (pointer=0x65736c6166206369) at ../../../../../../../src/postgres/src/include/utils/memutils.h:131
#1  pfree (pointer=0x65736c6166206369) at ../../../../../../../src/postgres/src/backend/utils/mmgr/mcxt.c:1086
#2  0x00000000004afbe2 in FreeTupleDesc (tupdesc=0x278b9c8) at ../../../../../../../src/postgres/src/backend/access/common/tupdesc.c:342
#3  0x00000000009d7cd9 in RelationDestroyRelation (relation=0x278b6a8, remember_tupdesc=<optimized out>) at ../../../../../../../src/postgres/src/backend/utils/cache/relcache.c:2996
#4  0x00000000009d99b6 in YBPreloadRelCache () at ../../../../../../../src/postgres/src/backend/utils/cache/relcache.c:1452
#5  0x0000000000864538 in YBRefreshCache () at ../../../../../../src/postgres/src/backend/tcop/postgres.c:3712
#6  0x0000000000869d21 in YBCheckSharedCatalogCacheVersion () at ../../../../../../src/postgres/src/backend/tcop/postgres.c:3951
#7  PostgresMain (argc=<optimized out>, argv=argv@entry=0x1bae228, dbname=0x1b86918 "parental_microservice", username=0x1badf78 "parentalservice")
    at ../../../../../../src/postgres/src/backend/tcop/postgres.c:4902
#8  0x000000000049948f in BackendRun (port=0x1babb80) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4428
#9  BackendStartup (port=0x1babb80) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4094
#10 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1737
#11 0x00000000007d2351 in PostmasterMain (argc=argc@entry=23, argv=argv@entry=0x1b844e0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1400
#12 0x000000000071e2ba in PostgresServerProcessMain (argc=23, argv=0x1b844e0) at ../../../../../../src/postgres/src/backend/main/main.c:234
#13 0x000000000071e4b9 in main ()

Postgres log from this session:

2020-11-09 15:20:09.988 UTC [4502] LOG:  connection received: host=240b:c0e0:101:5dc0:b494:2:0:978 port=44738
I1109 15:20:09.995862  4502 pggate.cc:106] Reset YSQL bind address to yb-tserver-2.yb-tservers.yb-dev-yugabyte-oss-new.svc.cluster.local:5432
I1109 15:20:09.996346  4502 server_base_options.cc:135] Updating master addrs to {yb-master-0.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-1.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-2.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100}
I1109 15:20:09.996788  4502 mem_tracker.cc:249] MemTracker: hard memory limit is 54.375000 GB
I1109 15:20:09.996814  4502 mem_tracker.cc:251] MemTracker: soft memory limit is 46.218750 GB
I1109 15:20:09.998586  4502 thread_pool.cc:171] Starting thread pool { name: pggate_ybclient queue_limit: 10000 max_workers: 1024 }
2020-11-09 15:20:10.050 UTC [4502] LOG:  connection authorized: user=parentalservice database=parental_microservice
2020-11-09 15:20:11.091 UTC [4502] LOG:  duration: 13.063 ms
2020-11-09 15:20:11.091 UTC [4502] LOG:  duration: 0.024 ms
2020-11-09 15:20:11.091 UTC [4502] LOG:  execute <unnamed>: SET extra_float_digits = 3
2020-11-09 15:20:11.092 UTC [4502] LOG:  duration: 0.298 ms
2020-11-09 15:20:11.092 UTC [4502] LOG:  duration: 0.116 ms
2020-11-09 15:20:11.093 UTC [4502] LOG:  duration: 0.024 ms
2020-11-09 15:20:11.093 UTC [4502] LOG:  execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'
2020-11-09 15:20:11.093 UTC [4502] LOG:  duration: 0.115 ms
2020-11-09 15:49:50.989 UTC [670] LOG:  server process (PID 4502) was terminated by signal 11: Segmentation fault

Postgres log from this time period:

2020-11-09 15:49:50.846 UTC [5023] LOG:  duration: 0.065 ms
2020-11-09 15:49:50.989 UTC [670] LOG:  server process (PID 4502) was terminated by signal 11: Segmentation fault
2020-11-09 15:49:50.989 UTC [670] DETAIL:  Failed process was running: SET application_name = 'PostgreSQL JDBC Driver'
2020-11-09 15:49:50.989 UTC [670] LOG:  terminating any other active server processes
2020-11-09 15:49:50.989 UTC [5094] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [5094] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [5094] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [5162] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [5162] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [5162] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [5268] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [5268] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [5268] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [5059] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [5059] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [5059] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4998] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4998] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4998] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [5023] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [5023] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [5023] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4928] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4928] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4928] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4739] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4739] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4739] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4681] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4681] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4681] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4669] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4669] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4669] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.989 UTC [4657] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.989 UTC [4657] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.989 UTC [4657] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4872] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4872] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4872] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4892] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4892] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4892] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4541] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4541] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4541] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4757] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4757] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4757] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4709] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4709] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4709] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4696] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4696] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4696] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4603] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4603] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4603] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4568] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4568] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4568] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4584] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4584] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4584] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:50.990 UTC [4523] WARNING:  terminating connection because of crash of another server process
2020-11-09 15:49:50.990 UTC [4523] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-09 15:49:50.990 UTC [4523] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-11-09 15:49:51.012 UTC [670] LOG:  all server processes terminated; reinitializing
2020-11-09 15:49:51.153 UTC [5296] LOG:  database system was interrupted; last known up at 2020-11-09 15:08:18 UTC
2020-11-09 15:49:51.159 UTC [5296] LOG:  database system was not properly shut down; automatic recovery in progress
2020-11-09 15:49:51.162 UTC [5296] LOG:  invalid record length at 0/1002DC0: wanted 24, got 0
2020-11-09 15:49:51.162 UTC [5296] LOG:  redo is not required
2020-11-09 15:49:51.193 UTC [670] LOG:  database system is ready to accept connections
2020-11-09 15:49:56.461 UTC [5300] LOG:  connection received: host=240b:c0e0:101:5dc0:b494:2:0:7c8 port=57914
I1109 15:49:56.468294  5300 pggate.cc:106] Reset YSQL bind address to yb-tserver-2.yb-tservers.yb-dev-yugabyte-oss-new.svc.cluster.local:5432
I1109 15:49:56.468791  5300 server_base_options.cc:135] Updating master addrs to {yb-master-0.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-1.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-2.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100}
I1109 15:49:56.469269  5300 mem_tracker.cc:249] MemTracker: hard memory limit is 54.375000 GB
I1109 15:49:56.469295  5300 mem_tracker.cc:251] MemTracker: soft memory limit is 46.218750 GB
I1109 15:49:56.470885  5300 thread_pool.cc:171] Starting thread pool { name: pggate_ybclient queue_limit: 10000 max_workers: 1024 }
I1109 15:49:56.473521  5307 async_initializer.cc:82] Starting to init ybclient
I1109 15:49:56.473898  5307 client-internal.cc:1936] New master addresses: [yb-master-0.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100,yb-master-1.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100,yb-master-2.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100]
I1109 15:49:56.485319  5307 async_initializer.cc:86] Successfully built ybclient
2020-11-09 15:49:56.519 UTC [5300] LOG:  connection authorized: user=imuser database=FORESIGHT
2020-11-09 15:49:58.090 UTC [5313] LOG:  connection received: host=240b:c0e0:101:5dc0:b494:2:0:449 port=33926
I1109 15:49:58.097684  5313 pggate.cc:106] Reset YSQL bind address to yb-tserver-2.yb-tservers.yb-dev-yugabyte-oss-new.svc.cluster.local:5432
I1109 15:49:58.098179  5313 server_base_options.cc:135] Updating master addrs to {yb-master-0.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-1.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100},{yb-master-2.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100}
I1109 15:49:58.098620  5313 mem_tracker.cc:249] MemTracker: hard memory limit is 54.375000 GB
I1109 15:49:58.098645  5313 mem_tracker.cc:251] MemTracker: soft memory limit is 46.218750 GB
I1109 15:49:58.100370  5313 thread_pool.cc:171] Starting thread pool { name: pggate_ybclient queue_limit: 10000 max_workers: 1024 }
I1109 15:49:58.102983  5320 async_initializer.cc:82] Starting to init ybclient
I1109 15:49:58.103384  5320 client-internal.cc:1936] New master addresses: [yb-master-0.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100,yb-master-1.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100,yb-master-2.yb-masters.yb-dev-yugabyte-oss-new.svc.cluster.local:7100]
I1109 15:49:58.114964  5320 async_initializer.cc:86] Successfully built ybclient
2020-11-09 15:49:58.153 UTC [5313] LOG:  connection authorized: user=ipamuser database=ipam
2020-11-09 15:49:58.275 UTC [5300] LOG:  duration: 11.720 ms
2020-11-09 15:49:58.275 UTC [5300] LOG:  duration: 0.026 ms
2020-11-09 15:49:58.275 UTC [5300] LOG:  execute <unnamed>: SET extra_float_digits = 3
2020-11-09 15:49:58.275 UTC [5300] LOG:  duration: 0.278 ms
2020-11-09 15:49:58.276 UTC [5300] LOG:  duration: 0.142 ms
2020-11-09 15:49:58.276 UTC [5300] LOG:  duration: 0.024 ms
2020-11-09 15:49:58.276 UTC [5300] LOG:  execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'
2020-11-09 15:49:58.276 UTC [5300] LOG:  duration: 0.109 ms
2020-11-09 15:49:58.547 UTC [5313] LOG:  duration: 11.660 ms
2020-11-09 15:49:58.547 UTC [5313] LOG:  duration: 0.021 ms
2020-11-09 15:49:58.547 UTC [5313] LOG:  execute <unnamed>: SET extra_float_digits = 3
2020-11-09 15:49:58.547 UTC [5313] LOG:  duration: 0.240 ms
2020-11-09 15:49:58.548 UTC [5313] LOG:  duration: 0.108 ms
2020-11-09 15:49:58.548 UTC [5313] LOG:  duration: 0.016 ms
2020-11-09 15:49:58.548 UTC [5313] LOG:  execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'
2020-11-09 15:49:58.548 UTC [5313] LOG:  duration: 0.082 ms
areysql kinbug prioritcritical

Most helpful comment

Here's the test case that replicates the problem:

  /** Test case inspired by #6317 and #6352, this caused SIGSERV crash. */
  @Test
  public void testDefaults() throws Exception {
    try (Connection conn1 = getConnectionBuilder().connect();
         Connection conn2 = getConnectionBuilder().connect();
         Statement stmt1 = conn1.createStatement();
         Statement stmt2 = conn2.createStatement()) {
      stmt1.executeUpdate("CREATE ROLE application;");

      stmt1.executeUpdate("CREATE TABLE with_default(id int PRIMARY KEY);");

      // This sequence just needs to exist, we don't even have to use it.
      stmt1.executeUpdate("CREATE SEQUENCE some_seq;");

      stmt1.executeUpdate("INSERT INTO with_default(id) VALUES (1);");
      stmt1.executeUpdate("ALTER TABLE with_default ADD COLUMN def1 int DEFAULT 10;");

      // Mixing in some "concurrent" DDLs.
      stmt2.executeUpdate("CREATE TABLE t();");
      stmt2.executeUpdate("DROP TABLE t;");

      // Either will fail.
      stmt1.executeUpdate("grant SELECT,INSERT, UPDATE, DELETE on with_default to application;");

      stmt2.executeUpdate("CREATE TABLE t();");
      stmt2.executeUpdate("DROP TABLE t;");

//      // Alternatively, this will also fail.
//      stmt1.executeUpdate("ALTER SEQUENCE some_seq OWNED BY with_default.id;");
//      stmt1.executeUpdate("DROP TABLE with_default;");
    }
  }

All 4 comments

src/postgres/src/backend/utils/mmgr/mcxt.c was modified after the above commit.

See if there is current in current 2.3.3

Thanks

Line 1452 of src/postgres/src/backend/utils/cache/relcache.c (as of commit 34cbfe2075ed7068e774157c59eb492a07f3fa6f)

            RelationCacheInsert(relation, true);

This seems to differ from the trace given above.

@tedyu: Thanks for noticing, this is expected - RelationCacheInsert is a macro which expands into RelationDestroyRelation among other things

Here's the test case that replicates the problem:

  /** Test case inspired by #6317 and #6352, this caused SIGSERV crash. */
  @Test
  public void testDefaults() throws Exception {
    try (Connection conn1 = getConnectionBuilder().connect();
         Connection conn2 = getConnectionBuilder().connect();
         Statement stmt1 = conn1.createStatement();
         Statement stmt2 = conn2.createStatement()) {
      stmt1.executeUpdate("CREATE ROLE application;");

      stmt1.executeUpdate("CREATE TABLE with_default(id int PRIMARY KEY);");

      // This sequence just needs to exist, we don't even have to use it.
      stmt1.executeUpdate("CREATE SEQUENCE some_seq;");

      stmt1.executeUpdate("INSERT INTO with_default(id) VALUES (1);");
      stmt1.executeUpdate("ALTER TABLE with_default ADD COLUMN def1 int DEFAULT 10;");

      // Mixing in some "concurrent" DDLs.
      stmt2.executeUpdate("CREATE TABLE t();");
      stmt2.executeUpdate("DROP TABLE t;");

      // Either will fail.
      stmt1.executeUpdate("grant SELECT,INSERT, UPDATE, DELETE on with_default to application;");

      stmt2.executeUpdate("CREATE TABLE t();");
      stmt2.executeUpdate("DROP TABLE t;");

//      // Alternatively, this will also fail.
//      stmt1.executeUpdate("ALTER SEQUENCE some_seq OWNED BY with_default.id;");
//      stmt1.executeUpdate("DROP TABLE with_default;");
    }
  }
Was this page helpful?
0 / 5 - 0 ratings