Hello,
I am using a MergeTree type of table with partition by dates.
Is there a way to compress old partitions?
For example, there are some data ranging two years.
| 2019 | 2020 |
| -------- | --------- |
| Compressed | Uncompressed |
The partition of 2019 is compressed and the newly coming year 2020 is uncompressed.
Because, next year, the data of 2019 may not be read more often than in 2020.
Thank you.
What do you mean "is uncompressed"?
Without LZ4 or ZSTD data compression? No such feature available. Compression is set at server / column level.
Uncompressed data will not make a system faster, because LZ4 is CPU lightweight compression but it saves a lot of disk I/O. So usually LZ4 compression makes a system faster.
@den-crane
Yes, I mean is uncompressed as without LZ4 or ZSTD.
Thank you for your advice.
And can I get some more detailed benchmark about total data read speed(decompression + disk read) comparison with LZ4 and Uncompressed form.
By default data is compressed by LZ4
https://clickhouse.yandex/docs/en/operations/server_settings/settings/#compression
https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse
Let's test with RANDOM numbers.
create table compressionTest
(A Int64, col_lz4 Int64 Codec(LZ4), col_none Int64 Codec(NONE))
Engine=MergeTree order by A;
insert into compressionTest select number, cityHash64(number), cityHash64(number)
from numbers(100000000);
SELECT
formatReadableSize(sum(column_bytes_on_disk)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
table,
column
FROM system.parts_columns
WHERE (active = 1) AND (table LIKE '%compressionTest%')
GROUP BY
table,
column
ββcompressedββ¬βuncompressedββ¬βtableββββββββββββ¬βcolumnββββ
β 97.80 MiB β 762.94 MiB β compressionTest β A β
β 766.51 MiB β 762.94 MiB β compressionTest β col_lz4 β
β 763.51 MiB β 762.94 MiB β compressionTest β col_none β
ββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ
select count() from compressionTest prewhere col_lz4 = 666;
Elapsed: 0.055 sec. Processed 100.00 million rows, 800.00 MB (1.81 billion rows/s., 14.46 GB/s.)
select count() from compressionTest prewhere col_none = 666;
Elapsed: 0.062 sec. Processed 100.00 million rows, 800.00 MB (1.61 billion rows/s., 12.87 GB/s.)
LZ4 are faster!!! 1.81 billion rows/s. VS 1.61 billion rows/s. !!!!
insert into compressionTest select number, cityHash64(number), cityHash64(number)
from numbers(100000000);
insert into compressionTest select number+5, cityHash64(number), cityHash64(number)
from numbers(100000000);
insert into compressionTest select number+13, cityHash64(number), cityHash64(number)
from numbers(100000000);
optimize table compressionTest final;
ββcompressedββ¬βuncompressedββ¬βtableββββββββββββ¬βcolumnββββ
β 101.33 MiB β 2.98 GiB β compressionTest β A β
β 1.59 GiB β 2.98 GiB β compressionTest β col_lz4 β
β 2.98 GiB β 2.98 GiB β compressionTest β col_none β
ββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ
select count() from compressionTest prewhere col_lz4 = 666;
Elapsed: 0.139 sec. Processed 400.00 million rows, 3.20 GB (2.88 billion rows/s., 23.07 GB/s.)
select count() from compressionTest prewhere col_none = 666;
Elapsed: 0.239 sec. Processed 400.00 million rows, 3.20 GB (1.67 billion rows/s., 13.37 GB/s.)
LZ4 are faster significantly !!! 2.88 billion rows/s. VS 1.67 billion rows/s.. !!!!
@achimbab, do you have any more question left about compressed/uncompressed scenario? If everything is clear close the issue, please.
@den-crane
Thank you very much.
@qoega
I will close this issue.
@den-crane
FYI.
I shared my test result.
When it comes to CPU : The slower the disk, the better the compression.
To Disk : Significant disk space efficiency.
But, the default is the best in my case.
| compression method | disk | rows/s |
| --------------------- | ----- | ------- |
| default | 766.34 MiB | 3.89 million |
| lz4 | 2.98 GiB | 2.26 billion |
| none | 2.98 GiB | 2.34 billion |
ClickHouse server version 19.19.1 revision 54430
Intel(R) Xeon(R) Silver 4110
256GB memory
NVMe SSD
achimbab.host :) create table compressionTest
:-] (A Int64, col_lz4 Int64 Codec(LZ4), col_none Int64 Codec(NONE))
:-] Engine=MergeTree order by A;
CREATE TABLE compressionTest
(
`A` Int64,
`col_lz4` Int64 CODEC(LZ4),
`col_none` Int64 CODEC(NONE)
)
ENGINE = MergeTree
ORDER BY A
Ok.
0 rows in set. Elapsed: 0.015 sec.
achimbab.host :) insert into compressionTest select number, cityHash64(number), cityHash64(number)
:-] from numbers(100000000);
INSERT INTO compressionTest SELECT
number,
cityHash64(number),
cityHash64(number)
FROM numbers(100000000)
Ok.
0 rows in set. Elapsed: 6.661 sec. Processed 100.01 million rows, 800.06 MB (15.01 million rows/s., 120.11 MB/s.)
achimbab.host :) SELECT
:-] formatReadableSize(sum(column_bytes_on_disk)) AS compressed,
:-] formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
:-] table,
:-] column
:-] FROM system.parts_columns
:-] WHERE (active = 1) AND (table LIKE '%compressionTest%')
:-] GROUP BY
:-] table,
:-] column
SELECT
formatReadableSize(sum(column_bytes_on_disk)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
table,
column
FROM system.parts_columns
WHERE (active = 1) AND (table LIKE '%compressionTest%')
GROUP BY
table,
column
ββcompressedββ¬βuncompressedββ¬βtableββββββββββββ¬βcolumnββββ
β 382.19 MiB β 762.94 MiB β compressionTest β A β
β 766.51 MiB β 762.94 MiB β compressionTest β col_lz4 β
β 763.51 MiB β 762.94 MiB β compressionTest β col_none β
ββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ
3 rows in set. Elapsed: 0.003 sec.
achimbab.host :) select count() from compressionTest prewhere col_lz4 = 666;
SELECT count()
FROM compressionTest
PREWHERE col_lz4 = 666
ββcount()ββ
β 0 β
βββββββββββ
1 rows in set. Elapsed: 0.049 sec. Processed 100.00 million rows, 800.00 MB (2.05 billion rows/s., 16.43 GB/s.)
achimbab.host :) select count() from compressionTest prewhere col_none = 666;
SELECT count()
FROM compressionTest
PREWHERE col_none = 666
ββcount()ββ
β 0 β
βββββββββββ
1 rows in set. Elapsed: 0.055 sec. Processed 100.00 million rows, 800.00 MB (1.80 billion rows/s., 14.44 GB/s.)
achimbab.host :) insert into compressionTest select number, cityHash64(number), cityHash64(number)
:-] from numbers(100000000);
INSERT INTO compressionTest SELECT
number,
cityHash64(number),
cityHash64(number)
FROM numbers(100000000)
Ok.
0 rows in set. Elapsed: 6.763 sec. Processed 100.01 million rows, 800.06 MB (14.79 million rows/s., 118.30 MB/s.)
achimbab.host :) insert into compressionTest select number+5, cityHash64(number), cityHash64(number)
:-] from numbers(100000000);
INSERT INTO compressionTest SELECT
number + 5,
cityHash64(number),
cityHash64(number)
FROM numbers(100000000)
Ok.
0 rows in set. Elapsed: 6.732 sec. Processed 100.01 million rows, 800.06 MB (14.86 million rows/s., 118.84 MB/s.)
achimbab.host :) insert into compressionTest select number+13, cityHash64(number), cityHash64(number)
:-] from numbers(100000000);
INSERT INTO compressionTest SELECT
number + 13,
cityHash64(number),
cityHash64(number)
FROM numbers(100000000)
Ok.
0 rows in set. Elapsed: 6.760 sec. Processed 100.01 million rows, 800.06 MB (14.79 million rows/s., 118.36 MB/s.)
achimbab.host :) optimize table compressionTest final;
OPTIMIZE TABLE compressionTest FINAL
Ok.
0 rows in set. Elapsed: 43.683 sec.
achimbab.host :) SELECT
:-] formatReadableSize(sum(column_bytes_on_disk)) AS compressed,
:-] formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
:-] table,
:-] column
:-] FROM system.parts_columns
:-] WHERE (active = 1) AND (table LIKE '%compressionTest%')
:-] GROUP BY
:-] table,
:-] column
SELECT
formatReadableSize(sum(column_bytes_on_disk)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
table,
column
FROM system.parts_columns
WHERE (active = 1) AND (table LIKE '%compressionTest%')
GROUP BY
table,
column
ββcompressedββ¬βuncompressedββ¬βtableββββββββββββ¬βcolumnββββ
β 766.34 MiB β 2.98 GiB β compressionTest β A β
β 1.59 GiB β 2.98 GiB β compressionTest β col_lz4 β
β 2.98 GiB β 2.98 GiB β compressionTest β col_none β
ββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ
3 rows in set. Elapsed: 0.004 sec.
achimbab.host :) select count() from compressionTest prewhere col_lz4 = 666;
SELECT count()
FROM compressionTest
PREWHERE col_lz4 = 666
ββcount()ββ
β 0 β
βββββββββββ
1 rows in set. Elapsed: 0.177 sec. Processed 400.00 million rows, 3.20 GB (2.26 billion rows/s., 18.05 GB/s.)
achimbab.host :) select count() from compressionTest prewhere col_none = 666;
SELECT count()
FROM compressionTest
PREWHERE col_none = 666
ββcount()ββ
β 0 β
βββββββββββ
1 rows in set. Elapsed: 0.171 sec. Processed 400.00 million rows, 3.20 GB (2.34 billion rows/s., 18.69 GB/s.)
Most helpful comment
By default data is compressed by LZ4
https://clickhouse.yandex/docs/en/operations/server_settings/settings/#compression
https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse
Let's test with RANDOM numbers.
Nothing to compress.
LZ4 are faster!!! 1.81 billion rows/s. VS 1.61 billion rows/s. !!!!
Add some duplicates
LZ4 are faster significantly !!! 2.88 billion rows/s. VS 1.67 billion rows/s.. !!!!