Timescaledb: manual compression of a table

Created on 1 Dec 2019  路  11Comments  路  Source: timescale/timescaledb

According to the following document, it is possible to manually compress only individual chunks (see Manual Compression section)

https://docs.timescale.com/latest/using-timescaledb/compression

It would be nice to have a function to compress all uncompressed chunks. It shall exclude last chunk, which might not be full yet.

investigate question

Most helpful comment

I have just discovered today

SELECT compress_chunk(i) from show_chunks('conditions', newer_than, older_than) i;

From: https://docs.timescale.com/latest/using-timescaledb/compression

This is what I was looking for, so closing.

All 11 comments

Hi @wrobell Can you say more about how the automated policy approach doesn't work for you? I'll have the effect of compressing all uncompressed chunks other than the last K based on your policy.

@mfreed I should be bit more specific, indeed.

My use case is as follows

  1. Create TimescaleDB database.
  2. Create table with compression enabled and chunk set to 24h.
  3. Set automatic chunk compression policy to 24h.
  4. Import few years (or months) of archival data into the table.

Now, I need to wait 24h, I assume, for the compression of chunks of archival data to happen. Sometimes more time is required, but I have not nailed it why.

After import of a large dataset into a table, it would be nice to trigger the compression of the chunks immediately.

So yes, you are correct, the limits here are around what we can do with compressed chunks (right now a compressed chunk can not be written too until it is decompressed) so... if we accept that as a current limitation, perhaps you might consider bringing the time to full for your chunks to something less than 24 hours to be able to compress faster? If you were to bring this down to 12 hours now you are cutting your wait to compress time in half? In terms of the time that it takes to complete the compression process, this may be variable based on resource consumption (do you have enough CPU etc.).

Thanks for the suggestions.

To put the beaviour of automatic chunk compression policy into perspective. I am importing about 120 million rows within 6 minutes on my laptop (and could be faster with better disk). Then I need to wait for hours to reduce 10GB dataset to 0.5GB.

If you think that the execution of automatic compression policies could be improved in the context of data import, please let me know. I will create new ticket and close this one as won't fix.

So I think getting the data into the compression process (done via policy ) faster and automatically will help, but I wonder about the resources required to do this (compression tends to be CPU hungry) and so I would temper any expectations against the available resources (just something to be mindful of as you run your test). Let us know how it goes and we can go from there!!

Hi @wrobell from your above message, just want to clarify something if there is any confusion: the 24h corresponds to timestamp time, not wall clock time.

Does it mean that if you fill first chunk and start creating 2nd one, then first chunk shall be compressed immediately?

@bboule The CPU question is interesting one. Some other systems will compress data on the fly, i.e. HDF5. Then you trade CPU usage for IO saving. With TimescaleDB, I assume, the data is saved uncompressed, then read uncompressed to save it again, compressed this time? Or is it more advanced under the hood?

Let me provide an example as well.

  1. Create database
createdb test-1
psql -f db-init.sql test-1   # see below for the db-init.sql file
  1. Run the data-load.py script. Below, I provide two versions of the script. First using PostgreSQL COPY command and second INSERT statements. To run a script
python3 data-load.py postgresql://localhost/test-1
  1. Monitor chunks compression with
select now(), hypertable_name, chunk_name, compression_status from timescaledb_information.compressed_chunk_stats;
  1. After few hours of monitoring - none of the chunks are compressed.

The db-init.sql script

create extension timescaledb;

create table data (
  time timestamptz not null,
  v1 float,
  v2 float,
  v3 float
);

select create_hypertable('data', 'time', chunk_time_interval => '1day'::interval);
alter table data set (timescaledb.compress, timescaledb.compress_orderby = 'time');
select add_compress_chunks_policy('data', '1day'::interval);

The data-load.py script (COPY version)

import asyncpg
import asyncio
import numpy as np
import sys
from datetime import datetime

async def load_data(db_uri):
    conn = await asyncpg.connect(db_uri)
    try:
        copy = conn.copy_records_to_table

        end = datetime.utcnow().timestamp()
        start = end - 48 * 3600
        tv = np.arange(start * 1e6, end * 1e6, 1e6).astype('datetime64[us]')

        n = len(tv)
        v1 = np.random.random(n)
        v2 = np.random.random(n)
        v3 = np.random.random(n)

        data = zip(tv.tolist(), v1, v2, v3)
        await copy('data', columns=('time', 'v1', 'v2', 'v3'), records=data)

    finally:
        await conn.close()

db_uri = sys.argv[1]
asyncio.run(load_data(db_uri))

The data-load.py script (INSERT version)

import asyncpg
import asyncio
import numpy as np
import sys
from datetime import datetime

SQL = "insert into data (time, v1, v2, v3) values ($1, $2, $3, $4)"

async def load_data(db_uri):
    conn = await asyncpg.connect(db_uri)
    try:
        end = datetime.utcnow().timestamp()
        start = end - 48 * 3600
        tv = np.arange(start * 1e6, end * 1e6, 1e6).astype('datetime64[us]')

        n = len(tv)
        v1 = np.random.random(n)
        v2 = np.random.random(n)
        v3 = np.random.random(n)

        data = zip(tv.tolist(), v1, v2, v3)
        await conn.executemany(SQL, data)

    finally:
        await conn.close()

db_uri = sys.argv[1]
asyncio.run(load_data(db_uri))

I have just discovered today

SELECT compress_chunk(i) from show_chunks('conditions', newer_than, older_than) i;

From: https://docs.timescale.com/latest/using-timescaledb/compression

This is what I was looking for, so closing.

Cool snippet @wrobell this almost could be in the official docs!

Was this page helpful?
0 / 5 - 0 ratings