Clickhouse: Replicating Replacing Merge - Duplicates/Final

Created on 17 Jan 2018  路  16Comments  路  Source: ClickHouse/ClickHouse

Hello,

I am finding some strange behaviour with a replicatedreplacingmergetree. I need to store time series data and can not have duplicates. I've tried a simple case of loading the exact same dataset of ~1.5million rows twice to test how well it rejects duplicates.

Result - select count() from table= ~2.3million rows, so lots of duplicates. Select count() from table FINAL - exactly the right number.

So duplicates are being correctly flagged but how do I optimize the database so that they no longer exist without using FINAL? I have tried optimize table and it immediately returns OK but with no change to the count, I have also tried optimize table deduplicate with the same results.

I understand that optimize should be an intensive process of remerging to sort this out? I'm happy with that as it could be run irregularly as duplicates won't occur in normal service.

How do I force optimize?

Thanks for such an impressive system!

question

Most helpful comment

After successful run of

set optimize_throw_if_noop = 1;
OPTIMIZE TABLE tbl FINAL;

that was done after replication is done I would expect no duplicates should be guaranteed.

All 16 comments

Further testing - using ReplacingMergeTree on a single non-replicated machine I found the optimize table works, but only if run a number of times in succession. Rapidly calling it ~10 times from tabix will see it successful.

Logs don't suggest any reason for this. Each time I run it I see it do some tidying, so not sure what is going on. Just running it once and leaving it alone doesn't seem to work, 2 or 3 times will see the count(*) reducing but still not fully deduplicated. 10+ times seems to deduplicate completely on this dataset.

Further - Trying again on a ReplicatedReplacingMergeTree across two servers. Optimize returns OK on one machine, but doesn't do anything, and on the 2nd machine it complains that the user password is not correct (use the same users.xml, and all DDL 'on cluster' statements work fine, and the data has replicated, so not sure why?)

Either way, I can't optimize a replicatedreplacingmergetree. I can't get rid of the duplicates.

Temporary solution:
Select
partition,
partCount
from(
SELECT
partition,
count(*) as partCount
FROM
system.parts
WHERE
active
AND (table = 'TABLE')
group by
partition
)
where
partCount > 1

And run optimize table

partition final. This runs rapidly and seems to force a consolidation of each in turn. Runs in <2 seconds on 3 million duplicates. This works on a ReplicatedReplacingMergeTree on two servers.

ReplacingMergeTree is mostly intended to save disk storage in long term, you still need to write queries assuming there will be duplicates in raw data.

@ben-motiv
Have you tried OPTIMIZE table FINAL?

https://github.com/yandex/ClickHouse/issues/2086

Further - Trying again on a ReplicatedReplacingMergeTree across two servers. Optimize returns OK on one machine, but doesn't do anything, and on the 2nd machine it complains that the user password is not correct (use the same users.xml, and all DDL 'on cluster' statements work fine, and the data has replicated, so not sure why?)

Either way, I can't optimize a replicatedreplacingmergetree. I can't get rid of the duplicates.

does this PR fixes the optimize() error on the second box? https://github.com/yandex/ClickHouse/pull/3742

@savelov

3742 will work for ReplicatedMergeTree, but I'm not sure also work for ReplicatedReplacingMergeTree

I found the same issue here, though in my case I am finding duplicates in a ReplicatedReplacingMergeTree even when running it as OPTIMIZE TABLE tbl FINAL.
Optimization returns OK but no changes are made on duplicates, still the same number. Is this because of being a Replicated* table?
Connected to ClickHouse server version 19.13.3 revision 54425.

@xmariachi try

in clickhouse-client:

set optimize_throw_if_noop = 1;
OPTIMIZE TABLE tbl FINAL;

In this case you'll get a message with a reason why merge could not start.

Thanks.
I've found this issue in the logs - may it be the cause? I replied in that permalink where you can see the problem:
https://github.com/yandex/ClickHouse/issues/6472#issuecomment-527967770

I upgraded servers to match the same version. I no longer see those errors.
However, the behaviour is still the same: data doesn't get compacted.
The query finishes fast:

OPTIMIZE TABLE tbl FINAL
Ok.

0 rows in set. Elapsed: 2.502 sec. 

However, the table does not do data compaction, or at least not on a certain set of data where duplicates are. Having set the optimize_throw_if_noop flag , nothing gets printed in the logs.

Table is a ReplicatedReplacingMergeTree.

@xmariachi are you sure that primary / order by keys are set correctly and you checking corresponding uniqueness?

Hi @NanoBjorn, yes, I think so. It works fine with duplicates on smaller table sizes.

@ben-motiv, do you have any further questions?

We also have this problem on about 200 million rows, OPTIMIZE in any shape and form does nothing and old rows are still duplicated. SELECT FINAL does not help as well.
ClickHouse client version 19.17.2.4 (official build).

@blinkov
Am I right to assume that in the end there is no strong guarantee about ReplReplaMergeTree's deduplication properties and we should always write queries in a way to expect duplicated samples? Because the current documentation does not make such an impression.

After successful run of

set optimize_throw_if_noop = 1;
OPTIMIZE TABLE tbl FINAL;

that was done after replication is done I would expect no duplicates should be guaranteed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zhicwu picture zhicwu  路  3Comments

innerr picture innerr  路  3Comments

amonakhov picture amonakhov  路  3Comments

derekperkins picture derekperkins  路  3Comments

fizerkhan picture fizerkhan  路  3Comments