We are writing millions of events to clickhouse every day. All the table use MergeTree engine. We are using following command to view number of rows
db-server :) select sum(rows) as rows from system.parts
SELECT sum(rows) AS rows
FROM system.parts
βββββββrowsββ
β 681585100
βββββββββββββ
After few minutes, I am issuing the same command
db-server :) select sum(rows) as rows from system.parts
SELECT sum(rows) AS rows
FROM system.parts
βββββββrowsββ
β 679236922
βββββββββββββ
I saw that number of records has been reduced. Why does it happen?
Likely it was just some inactive parts being deleted in background, try adding ...where active to that query.
@blinkov I see following definitions in the doc
active (UInt8) β Indicates whether the part is active. If a part is active, it is used in a table;
otherwise, it will be deleted. Inactive data parts remain after merging.
When will the part become active or inactive?
In our end, we don't delete any parts.
It happens in background automatically: basically ClickHouse chooses some active parts, merges the data from them into one larger part, then atomically replaces old parts with the new one. New one becomes active, while old ones become inactive to be deleted later (their data is also in the new larger part, so they are not needed anymore). So this process temporary duplicates rows across all parts (active+inactive), but not the visible rows (from active parts).
Another possible reason for number of rows going down without explicit deletes is using table engines that not just merge parts in background, but also process them in some way, like CollapsingMergeTree or AggregatingMergeTree.
Most helpful comment
It happens in background automatically: basically ClickHouse chooses some active parts, merges the data from them into one larger part, then atomically replaces old parts with the new one. New one becomes active, while old ones become inactive to be deleted later (their data is also in the new larger part, so they are not needed anymore). So this process temporary duplicates rows across all parts (active+inactive), but not the visible rows (from active parts).
Another possible reason for number of rows going down without explicit deletes is using table engines that not just merge parts in background, but also process them in some way, like CollapsingMergeTree or AggregatingMergeTree.