Clickhouse: Smoosh column bin,mrk files in part to a single larger file to reduce inode number

Created on 1 Aug 2019  路  7Comments  路  Source: ClickHouse/ClickHouse

Use case
Too many small files in part directory and finally inode exhausts, and maybe 'too many open files'.
When there are 300 tables and the partition expression is yyyyMMdd type, each table ttl is 1 year, each table has 100 columns, then we get 300 * 365 * 100 * 2=21,900,000 files

Change partition expression to yyyyMM may help, but yyyyMM can't help if I want to replace one-day data atomically(reimport hive day partition into clickhouse when hive partition data is updated).

I found issues #4617 #5166 are related. PR #5171 add a max_parts_in_total limit, but I don't think it is the best way.

Describe the solution you'd like
Smoosh column bin,mrk files to a single larger file to reduce file number in part directory when this part is unlikely to change (the data is generated many days ago, and insert/update/delete is unlikely to happen). https://druid.apache.org/docs/latest/design/segments.html the druid way can be an example: small files are smooshed into meta.smoosh and 00000.smoosh(or another 00001.smoosh if 00000.smoosh is larger than 2GB).
In this way, the file number is reduced from 21,900,000 to 21,900

If it is difficult to judge what is the best timing to do the smoothing automatically, I suggest the https://clickhouse.yandex/docs/en/query_language/misc/#misc_operations-optimize optimize final query can do this

feature st-fixed

Most helpful comment

@hustnn https://github.com/ClickHouse/ClickHouse/blob/master/docs/ru/extended_roadmap.md#16-锌芯谢懈屑芯褉褎薪褘械-泻褍褋泻懈-写邪薪薪褘褏, doesn't seem to be available in english yet.

All 7 comments

This issue will be addressed by development of "polymorphic parts" that is currently in progress by @CurtizJ

@alexey-milovidov @CurtizJ

Is any more detail about polymorphic parts? Thanks.

@hustnn https://github.com/ClickHouse/ClickHouse/blob/master/docs/ru/extended_roadmap.md#16-锌芯谢懈屑芯褉褎薪褘械-泻褍褋泻懈-写邪薪薪褘褏, doesn't seem to be available in english yet.

@nvartolomei Thanks. Let me use google translate first to take a look at the basic idea. I am also facing this issue now.

It is proposed to allow pieces of tables such as MergeTree to arrange data in different formats. Namely: - in RAM; - on a disk with all columns in one file; - on a disk with columns in separate files.

Does one table have only one format or one table can have different formats (part in memory, part in disk) ?

Does one table have only one format or one table can have different formats (part in memory, part in disk)?

One table can have parts in multiple different formats. E.g. small parts in compact (write-optimized) format and larger parts in wide (read optimized) format.

This feature is implemented and available in version 20.3. Synopsis:

CREATE TABLE test.hits_compact AS test.hits
    ENGINE = MergeTree ORDER BY ...
    SETTINGS min_bytes_for_wide_part = '10M'

Next step: enable it with reasonable threshold by default.

Was this page helpful?
0 / 5 - 0 ratings