Timescaledb: Compression segment by on string type very slow

Created on 14 Mar 2020 · 1Comment · Source: timescale/timescaledb

OS: Debian GNU/Linux 10 (buster)
PostgreSQL version: 11.7
TimescaleDB version: tested with 1.6.0, 1.6.1, 1.7.0-dev
Installation method: Docker

Compression with _segment by_ on string type is very slow compared to _segment by_ on int type. Case 1 does not use _Recheck Cond_ to extract the queried id. Does _segment by_ only works with int type or is it a bug?

to reproduce, just for a single row

CREATE TABLE public.test2 (
    id bigserial NOT NULL,
    assetid varchar(100) NULL,
    asset int8 NULL,
    ...
);

case 1 (very slow)

ALTER TABLE test2 SET (timescaledb.compress, timescaledb.compress_segmentby = 'assetid', timescaledb.compress_orderby = 'timestamp DESC, id DESC');

SELECT compress_chunk('_timescaledb_internal._hyper_2_346_chunk');

explain (analyse, buffers) select * from test2 where "timestamp" >= '2019-12-14 01:00' and "timestamp" < '2019-12-15 01:00' and assetid = '99dbfcbdbd3c2b27f6c92b70';

Append  (cost=0.76..9136.03 rows=1585000 width=317) (actual time=26807.010..26807.013 rows=1 loops=1)
  Buffers: shared hit=32364963
  ->  Custom Scan (DecompressChunk) on _hyper_2_346_chunk  (cost=0.76..1211.04 rows=1585000 width=317) (actual time=26807.008..26807.011 rows=1 loops=1)
        Filter: (("timestamp" >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND ("timestamp" < '2019-12-15 01:00:00+01'::timestamp with time zone) AND ((assetid)::text = '99dbfcbdbd3c2b27f6c92b70'::text))
        Rows Removed by Filter: 7849160
        Buffers: shared hit=32364963
        ->  Seq Scan on compress_hyper_9_433_chunk  (cost=0.00..1211.04 rows=1585 width=1092) (actual time=0.029..13.264 rows=14269 loops=1)
              Filter: ((_ts_meta_max_1 >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND (_ts_meta_min_1 < '2019-12-15 01:00:00+01'::timestamp with time zone))
              Buffers: shared hit=997
Planning Time: 3.031 ms
Execution Time: 26807.167 ms

case 2 (very fast)

ALTER TABLE test2 SET (timescaledb.compress, timescaledb.compress_segmentby = 'asset', timescaledb.compress_orderby = 'timestamp DESC, id DESC');

SELECT compress_chunk('_timescaledb_internal._hyper_2_346_chunk');

explain (analyse, buffers) select * from test2 where "timestamp" >= '2019-12-14 01:00' and "timestamp" < '2019-12-15 01:00' and asset = 47114242;

Append  (cost=10.59..77.94 rows=5000 width=317) (actual time=0.082..0.087 rows=1 loops=1)
  Buffers: shared hit=1 read=2
  ->  Custom Scan (DecompressChunk) on _hyper_2_346_chunk  (cost=10.59..52.94 rows=5000 width=317) (actual time=0.081..0.085 rows=1 loops=1)
        Filter: (("timestamp" >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND ("timestamp" < '2019-12-15 01:00:00+01'::timestamp with time zone))
        Buffers: shared hit=1 read=2
        ->  Bitmap Heap Scan on compress_hyper_10_434_chunk  (cost=1.75..52.94 rows=5 width=1068) (actual time=0.060..0.062 rows=1 loops=1)
              Recheck Cond: (asset = 47114242)
              Filter: ((_ts_meta_max_1 >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND (_ts_meta_min_1 < '2019-12-15 01:00:00+01'::timestamp with time zone))
              Heap Blocks: exact=1
              Buffers: shared hit=1 read=2
              ->  Bitmap Index Scan on compress_hyper_10_434_chunk__compressed_hypertable_10_asset__ts  (cost=0.00..1.74 rows=48 width=0) (actual time=0.035..0.035 rows=1 loops=1)
                    Index Cond: (asset = 47114242)
                    Buffers: shared read=2
Planning Time: 2.878 ms
Execution Time: 0.254 ms

Compression

Source

KMUS

Most helpful comment

The problem is the types you are using. The expression for the assetid column is (assetid)::text = '99dbfcbdbd3c2b27f6c92b70'::text which has a cast on the left side and because of that will not be pushed down below decompression. As a workaround you can either use TEXT for assetid or do the CAST on the right side in your expression assetid = '99dbfcbdbd3c2b27f6c92b70'::varchar. Unless you absolutely want to have the 100 char length restriction there using TEXT over varchar is recommended anyway and even with TEXT you could add a check constraint to enforce the length limit. Under the hood TEXT and VARCHAR are stored the same way in postgres.

https://stackoverflow.com/questions/4848964/postgresql-difference-between-text-and-varchar-character-varying