Clickhouse: [Question] Why rows are reducing

Created on 12 Aug 2018  Β·  3Comments  Β·  Source: ClickHouse/ClickHouse

We are writing millions of events to clickhouse every day. All the table use MergeTree engine. We are using following command to view number of rows

db-server :) select sum(rows) as rows from system.parts

SELECT sum(rows) AS rows
FROM system.parts

β”Œβ”€β”€β”€β”€β”€β”€rows─┐
β”‚ 681585100    
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

After few minutes, I am issuing the same command

db-server :) select sum(rows) as rows from system.parts

SELECT sum(rows) AS rows
FROM system.parts

β”Œβ”€β”€β”€β”€β”€β”€rows─┐
β”‚ 679236922  
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I saw that number of records has been reduced. Why does it happen?

question

Most helpful comment

It happens in background automatically: basically ClickHouse chooses some active parts, merges the data from them into one larger part, then atomically replaces old parts with the new one. New one becomes active, while old ones become inactive to be deleted later (their data is also in the new larger part, so they are not needed anymore). So this process temporary duplicates rows across all parts (active+inactive), but not the visible rows (from active parts).

Another possible reason for number of rows going down without explicit deletes is using table engines that not just merge parts in background, but also process them in some way, like CollapsingMergeTree or AggregatingMergeTree.

All 3 comments

Likely it was just some inactive parts being deleted in background, try adding ...where active to that query.

@blinkov I see following definitions in the doc

active (UInt8) – Indicates whether the part is active. If a part is active, it is used in a table; 
                           otherwise, it will be deleted. Inactive data parts remain after merging.

When will the part become active or inactive?
In our end, we don't delete any parts.

It happens in background automatically: basically ClickHouse chooses some active parts, merges the data from them into one larger part, then atomically replaces old parts with the new one. New one becomes active, while old ones become inactive to be deleted later (their data is also in the new larger part, so they are not needed anymore). So this process temporary duplicates rows across all parts (active+inactive), but not the visible rows (from active parts).

Another possible reason for number of rows going down without explicit deletes is using table engines that not just merge parts in background, but also process them in some way, like CollapsingMergeTree or AggregatingMergeTree.

Was this page helpful?
0 / 5 - 0 ratings