Compression with _segment by_ on string type is very slow compared to _segment by_ on int type. Case 1 does not use _Recheck Cond_ to extract the queried id. Does _segment by_ only works with int type or is it a bug?
to reproduce, just for a single row
CREATE TABLE public.test2 (
id bigserial NOT NULL,
assetid varchar(100) NULL,
asset int8 NULL,
...
);
case 1 (very slow)
ALTER TABLE test2 SET (timescaledb.compress, timescaledb.compress_segmentby = 'assetid', timescaledb.compress_orderby = 'timestamp DESC, id DESC');
SELECT compress_chunk('_timescaledb_internal._hyper_2_346_chunk');
explain (analyse, buffers) select * from test2 where "timestamp" >= '2019-12-14 01:00' and "timestamp" < '2019-12-15 01:00' and assetid = '99dbfcbdbd3c2b27f6c92b70';
Append (cost=0.76..9136.03 rows=1585000 width=317) (actual time=26807.010..26807.013 rows=1 loops=1)
Buffers: shared hit=32364963
-> Custom Scan (DecompressChunk) on _hyper_2_346_chunk (cost=0.76..1211.04 rows=1585000 width=317) (actual time=26807.008..26807.011 rows=1 loops=1)
Filter: (("timestamp" >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND ("timestamp" < '2019-12-15 01:00:00+01'::timestamp with time zone) AND ((assetid)::text = '99dbfcbdbd3c2b27f6c92b70'::text))
Rows Removed by Filter: 7849160
Buffers: shared hit=32364963
-> Seq Scan on compress_hyper_9_433_chunk (cost=0.00..1211.04 rows=1585 width=1092) (actual time=0.029..13.264 rows=14269 loops=1)
Filter: ((_ts_meta_max_1 >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND (_ts_meta_min_1 < '2019-12-15 01:00:00+01'::timestamp with time zone))
Buffers: shared hit=997
Planning Time: 3.031 ms
Execution Time: 26807.167 ms
case 2 (very fast)
ALTER TABLE test2 SET (timescaledb.compress, timescaledb.compress_segmentby = 'asset', timescaledb.compress_orderby = 'timestamp DESC, id DESC');
SELECT compress_chunk('_timescaledb_internal._hyper_2_346_chunk');
explain (analyse, buffers) select * from test2 where "timestamp" >= '2019-12-14 01:00' and "timestamp" < '2019-12-15 01:00' and asset = 47114242;
Append (cost=10.59..77.94 rows=5000 width=317) (actual time=0.082..0.087 rows=1 loops=1)
Buffers: shared hit=1 read=2
-> Custom Scan (DecompressChunk) on _hyper_2_346_chunk (cost=10.59..52.94 rows=5000 width=317) (actual time=0.081..0.085 rows=1 loops=1)
Filter: (("timestamp" >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND ("timestamp" < '2019-12-15 01:00:00+01'::timestamp with time zone))
Buffers: shared hit=1 read=2
-> Bitmap Heap Scan on compress_hyper_10_434_chunk (cost=1.75..52.94 rows=5 width=1068) (actual time=0.060..0.062 rows=1 loops=1)
Recheck Cond: (asset = 47114242)
Filter: ((_ts_meta_max_1 >= '2019-12-14 01:00:00+01'::timestamp with time zone) AND (_ts_meta_min_1 < '2019-12-15 01:00:00+01'::timestamp with time zone))
Heap Blocks: exact=1
Buffers: shared hit=1 read=2
-> Bitmap Index Scan on compress_hyper_10_434_chunk__compressed_hypertable_10_asset__ts (cost=0.00..1.74 rows=48 width=0) (actual time=0.035..0.035 rows=1 loops=1)
Index Cond: (asset = 47114242)
Buffers: shared read=2
Planning Time: 2.878 ms
Execution Time: 0.254 ms
The problem is the types you are using. The expression for the assetid column is (assetid)::text = '99dbfcbdbd3c2b27f6c92b70'::text which has a cast on the left side and because of that will not be pushed down below decompression. As a workaround you can either use TEXT for assetid or do the CAST on the right side in your expression assetid = '99dbfcbdbd3c2b27f6c92b70'::varchar. Unless you absolutely want to have the 100 char length restriction there using TEXT over varchar is recommended anyway and even with TEXT you could add a check constraint to enforce the length limit. Under the hood TEXT and VARCHAR are stored the same way in postgres.
Most helpful comment
The problem is the types you are using. The expression for the assetid column is
(assetid)::text = '99dbfcbdbd3c2b27f6c92b70'::textwhich has a cast on the left side and because of that will not be pushed down below decompression. As a workaround you can either use TEXT for assetid or do the CAST on the right side in your expressionassetid = '99dbfcbdbd3c2b27f6c92b70'::varchar. Unless you absolutely want to have the 100 char length restriction there using TEXT over varchar is recommended anyway and even with TEXT you could add a check constraint to enforce the length limit. Under the hood TEXT and VARCHAR are stored the same way in postgres.https://stackoverflow.com/questions/4848964/postgresql-difference-between-text-and-varchar-character-varying