Clickhouse: How to guarantee data deduplication by clickhouse?

Created on 31 Aug 2017  路  3Comments  路  Source: ClickHouse/ClickHouse

At first锛孖 thought ReplacingMergeTree can do this, after i tried serveral times (insert a set of data by file with version 1, than insert the same data set with version 2), i find this method can't realize data deduplication, even if i create a materialized view by select with final keyword, or group by max(ver).

I also read it from the documents "https://clickhouse.yandex/docs/en/table_engines/replacingmergetree.html" that guarantee of data deduplication can't be done by ReplacingMergeTree .

So how do i guarantee data deduplication when some cases i have to insert a set of data many times(load data from file to table is terminated by unexpected exceptions)?

st-need-info

Most helpful comment

I can't remember which version introduced the DEDUPLICATE to the OPTIMIZE query but it was effective for me in removing dupes. It does appear to "rewrite" the whole table. Also, I can't remember if I needed to use FINAL.

OPTIMIZE TABLE tablename [FINAL] DEDUPLICATE;

All 3 comments

If you use Replicated tables, they will deduplicate inserted blocks of data:
-- if exactly identical block of data (same rows in same order) was inserted twice, it will be effectively inserted once.

See:
https://clickhouse.yandex/docs/en/table_engines/replication.html?highlight=deduplicated

(after "Blocks of data are deduplicated.")

Non replicated tables doesn't have this feature.

I can't remember which version introduced the DEDUPLICATE to the OPTIMIZE query but it was effective for me in removing dupes. It does appear to "rewrite" the whole table. Also, I can't remember if I needed to use FINAL.

OPTIMIZE TABLE tablename [FINAL] DEDUPLICATE;

@lttPo do you have any further questions?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jangorecki picture jangorecki  路  3Comments

zhicwu picture zhicwu  路  3Comments

goranc picture goranc  路  3Comments

vixa2012 picture vixa2012  路  3Comments

opavader picture opavader  路  3Comments