Clickhouse: What is the reasoning getMaxSourcePartsSize implementation

Created on 19 Jan 2019  路  4Comments  路  Source: ClickHouse/ClickHouse

https://github.com/yandex/ClickHouse/blob/a0d8743c4c1249f1e2394c6eb47bbbfcc83c502d/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp#L137-L154

Hi, I recently stumbled on a lagging replica failure caused by a "live lock" between threads trying to find suitable merges. I'm still collecting data to be able to replicate the issue.

At the same time I'm trying to build a mental model about how this whole merge selector (work assigning) by master and merge execution by replicas fit together.

What I'm trying to understand at the moment, is why getMaxSourcePartsSize is implemented the way it is.

Why bigger merges are preferred when thread pool is empty is smaller when thread pool is busy?

question question-answered

Most helpful comment

What I'm trying to understand at the moment, is why getMaxSourcePartsSize is implemented the way it is.

Why bigger merges are preferred when thread pool is empty is smaller when thread pool is busy?

We have fixed number of slots in background processing pool and we don't want for long running merges to occupy all the slots. We want to allow smaller merges to run, otherwise the number of small (just inserted) parts will grow up quickly.

For this purpose, we implement simple heuristic. When more than half of slots in background pool is occupied, lower the limit on max source part size by dividing it by some number after every next slot is occupied.

All 4 comments

I would also like this better documented. Struggling to keep our active part count under 200 for our raw data tables. Each insert is 60k rows in a batch at ~800B/row uncompressed (text data). Our volume is ~150k rows/s spread across 9 nodes.

@abraithwaite default server settings? 60k sounds like a small batch size, I had great experience with batch size > 1M.

What I'm trying to understand at the moment, is why getMaxSourcePartsSize is implemented the way it is.

Why bigger merges are preferred when thread pool is empty is smaller when thread pool is busy?

We have fixed number of slots in background processing pool and we don't want for long running merges to occupy all the slots. We want to allow smaller merges to run, otherwise the number of small (just inserted) parts will grow up quickly.

For this purpose, we implement simple heuristic. When more than half of slots in background pool is occupied, lower the limit on max source part size by dividing it by some number after every next slot is occupied.

FWIW, for those coming to this issue, we also increased our insertion batch size to 150k events, and moved from inserting into a distributed table, to inserting directly into the shards. This relieves a lot of the load since inserting into a distributed table re-shards the data again across your number of shards given the sharding key.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jangorecki picture jangorecki  路  3Comments

vixa2012 picture vixa2012  路  3Comments

opavader picture opavader  路  3Comments

bseng picture bseng  路  3Comments

goranc picture goranc  路  3Comments