Timescaledb: Segfault when upgrading Postgres 10.0/10.1 to 10.2 while using same binary TimescaleDB release

Created on 9 Feb 2018  路  13Comments  路  Source: timescale/timescaledb

After issuing last command (insert from select), database had segfaulted. Notice that one column is called "timestamp" and it stores bigint. I suspect partialy that, but I have once inserted something to similar table, so I am not sure, where the problem comes from.

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
--
CREATE TABLE dbschema.new_table (LIKE dbschema.journal INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);

SELECT create_hypertable('dbschema.new_table', 'timestamp', 'aggregate_id', 4, chunk_time_interval => 1000*3600*24*7);

\d+ dbschema.new_table
                                                      Table "dbschema.new_table"
      Column      |  Type   | Collation | Nullable |                Default                | Storage  | Stats target | Description 
------------------+---------+-----------+----------+---------------------------------------+----------+--------------+-------------
 journal_sequence | bigint  |           | not null | nextval('dbschema.journal_seq'::regclass) | plain    |              | 
 aggregate_id     | text    |           | not null |                                       | extended |              | 
 aggregate_type   | text    |           | not null |                                       | extended |              | 
 commit_sequence  | bigint  |           | not null |                                       | plain    |              | 
 timestamp        | bigint  |           | not null |                                       | plain    |              | 
 tracing_id       | text    |           |          |                                       | extended |              | 
 payload          | bytea   |           | not null |                                       | extended |              | 
 is_gap           | boolean |           | not null | false                                 | plain    |              | 
Indexes:
    "new_table_aggregate_id_timestamp_idx" btree (aggregate_id, "timestamp" DESC)
    "new_table_timestamp_idx" btree ("timestamp" DESC)
Check constraints:
    "journal_commit_sequence_check" CHECK (commit_sequence > 0)
    "journal_journal_sequence_check" CHECK (journal_sequence > 0)


insert into dbschema.new_table (aggregate_id, aggregate_type, commit_sequence, "timestamp", tracing_id, payload) values ('aggid11', 'testAgg', 1, EXTRACT(EPOCH FROM now()) * 1000, 'some tracing optionally', decode('12AC', 'hex'));server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Some lines from postgresql log file (insert query was another than in previous example).

2018-02-09 13:53:30 CET [6287]: [8-1] db=,app= LOG:  server process (PID 6310) was terminated by signal 11: Segmentation fault
2018-02-09 13:53:30 CET [6287]: [9-1] db=,app= DETAIL:  Failed process was running: insert into dbschema.new_table select * from dbschema.journal;
2018-02-09 13:53:30 CET [6287]: [10-1] db=,app= LOG:  terminating any other active server processes
2018-02-09 13:53:30 CET [6523]: [3-1] db=db_es,app=[unknown] WARNING:  terminating connection because of crash of another server process

PS: Also trying to run query like next one causes segfault:

-- note that existingSchema contains existing table journal, like previous example.
-- nonexisting schema is schema from another database (on same postgres server through).
CREATE TABLE nonexistingschema.new_table (LIKE existingSchema.journal INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);

Most helpful comment

On further investigation, this issue appears related to the fact that 10.1 -> 10.2 in Postgres changed some of its internal structs (e.g., ColumnDef got a new field, see https://github.com/postgres/postgres/compare/REL_10_1...REL_10_2).

So the problem people are seeing above is actually when they take a binary version of TimescaleDB built against 10.1 and then run it against 10.2.

It鈥檚 not actually a problem with v0.8.0 itself. If you build 0.8.0 against PG 10.1 or 10.2, both work correctly. It鈥檚 just when you build against 10.1, then upgrade Postgres to 10.2 with the old version of Timescale running.

So note that people using source packages wouldn鈥檛 see this problem, nor when people build from source (e.g., from our release tags). But the problem does exist with binary releases like deb/rpm (as our v0.8.0 deb/rpm release was built against PG 10.1).

We鈥檙e investigating the best way to handle this moving forward. We might have to move to separate binary releases for different minor versions, as they can be breaking changes. We鈥檙e also reaching out to the PG mailing list if these types of breaking changes should be expected in minor releases.

All 13 comments

Problem will be probably somewhere in installation. I have tried to make example test from creating-hypertables page and result is same:

[root@vagrant-infra ~]# PAGER=less psql -U postgres
psql (10.2)
Type "help" for help.

postgres=# \c tutorial 
You are now connected to database "tutorial" as user "postgres".
tutorial=# 
tutorial=# CREATE TABLE conditions (
tutorial(#   time        TIMESTAMPTZ       NOT NULL,
tutorial(#   location    TEXT              NOT NULL,
tutorial(#   temperature DOUBLE PRECISION  NULL,
tutorial(#   humidity    DOUBLE PRECISION  NULL
tutorial(# );
CREATE TABLE
tutorial=# SELECT create_hypertable('conditions', 'time');
 create_hypertable 
-------------------

(1 row)

tutorial=# INSERT INTO conditions(time, location, temperature, humidity)
tutorial-#   VALUES (NOW(), 'office', 70.0, 50.0);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I am running:

select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit

and I have installed timescaledb from timescaledb-0.8.0-postgresql-10-0.x86_64.rpm file. I think that mishmash in minor version is an issue.

Do you plan to build rpm files also for other postgresql versions?

Hi @LuboVarga , thanks for the report.

In our package name, 10-0 refers to the postgresql 10 series regardless of minor version (0 is the patch number for the rpm itself). We have not tested on 10.2 yet so we will investigate and see if something changed between 10/10.1 and 10.2 that caused this segfault (it all seems fine on 10.1)

+1 here

We have confirmed that this appears to be a regression between PG 10.1 and 10.2: We can replicate the segfault with timescale v0.8.0 on 10.2, but it doesn't seem to occur on 10.1.

Also happy to report that our soon-to-be released 0.9.0 seems to run correctly on both 10.1 and 10.2, but going through some more testing to identify and validate the actual problem/fix.

I have tried to compile timescaledb from source. I have checked out 0.8.0 tag and I have executed ./bootstrap. An error with old version of cmake (centos 7) popped up. After yum install cmake3 (also postgre devel package) and replace cmake call with cmake3 call in https://github.com/timescale/timescaledb/blob/master/bootstrap#L33 it have worked. Also installation worked and my test with inserting something to hypertable was working. I hope, this will help.

On further investigation, this issue appears related to the fact that 10.1 -> 10.2 in Postgres changed some of its internal structs (e.g., ColumnDef got a new field, see https://github.com/postgres/postgres/compare/REL_10_1...REL_10_2).

So the problem people are seeing above is actually when they take a binary version of TimescaleDB built against 10.1 and then run it against 10.2.

It鈥檚 not actually a problem with v0.8.0 itself. If you build 0.8.0 against PG 10.1 or 10.2, both work correctly. It鈥檚 just when you build against 10.1, then upgrade Postgres to 10.2 with the old version of Timescale running.

So note that people using source packages wouldn鈥檛 see this problem, nor when people build from source (e.g., from our release tags). But the problem does exist with binary releases like deb/rpm (as our v0.8.0 deb/rpm release was built against PG 10.1).

We鈥檙e investigating the best way to handle this moving forward. We might have to move to separate binary releases for different minor versions, as they can be breaking changes. We鈥檙e also reaching out to the PG mailing list if these types of breaking changes should be expected in minor releases.

@LuboVarga I have patched the RPM release so that it will work with 10.2 (which we have now made a requirement), would you mind installing via that method again and making sure it works? My test went well.

Same link as before:
https://timescalereleases.blob.core.windows.net/rpm/timescaledb-0.8.0-postgresql-10-0.x86_64.rpm

For users with the problem on Ubuntu -- fixed releases are coming soon/today.

_Posted on slack, but posting here so any Google searches that land here know what to do_

We have finished publishing fixed packages for 0.8.0 to channels affected by the changes in PostgreSQL 10.2, i.e., users who updated PostgreSQL from 10.0/.1 to 10.2 and are using apt, brew, yum/dnf or installing from source. If you are using Docker, you are _not_ affected and do not need to do anything. If you are installing TimescaleDB for the first time on 10.2, you do not need to do anything.

source users
If you are building the extension from source, and are upgrading from 10.0/.1 to 10.2, you need to rebuild the extension after upgrading to 10.2. This will allow our extension to work correctly with 10.2. Reinstalling TimescaleDB will _NOT_ cause you to lose any data; it is a safe operation.

brew users
Once you upgrade to PostgreSQL 10.2, you will need to uninstall and reinstall TimescaleDB. This will allow our extension to work correctly with 10.2. Reinstalling TimescaleDB will _NOT_ cause you to lose any data; it is a safe operation.

rpm package users (Fedora, CentOS, RHEL)
If you were previously using our package for PostgreSQL on 10.0/.1, you will need to re-download the rpm and reinstall it. If you are planning to start using it with PostgreSQL 10, you must be on version 10.2. If you are on 10.0/.1 and not ready to upgrade to 10.2, do _NOT_ update your timescaledb package (yum should disallow this).

deb package users (Ubuntu)
If you were previously using our package for PostgreSQL on 10.0/.1, you will need to run apt update and apt upgrade timescaledb-postgresql-10. This will also upgrade any existing PostgreSQL 10 installation to 10.2, so if you are not ready to upgrade PostgreSQL, do not upgrade timescaledb.

0.9.0 should be out soon and it will similarly depend on 10.2.

Thanks for fast reaction and solution.

No problem! Going to re-open the issue for a few days for some more visibility.

Going to close now that this is resolved and should have given people enough time to see. Again thanks for the report!

Was this page helpful?
0 / 5 - 0 ratings