Clickhouse: Materialized View misses some data

Created on 10 Jan 2019 · 12Comments · Source: ClickHouse/ClickHouse

Hello. First of all thx for a great product. Very fast and flexible.

Recently we started using MaterializedView and noticed one issue (or not an issue but we did something wrong)

We created MVs based on local Metrics tables (they use ReplicatedMergeTree() engine and we constantly writes to those local tables (each second)) like:

CREATE MATERIALIZED VIEW default.MetricStats ON CLUSTER metrics 
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/metrics/{shard}/MetricStats', '{replica}')
PARTITION BY floor(time / 86400)
ORDER BY (time, appId) POPULATE AS
SELECT 
    uniqExactState(host) AS `os.host`, 
    uniqExactState(container) AS `container.id`, 
    uniqExactState(containerHostname) AS `container.host.hostname`, 
    appId, 
    floor(timestamp / 60) * 60 AS time
FROM default.Metrics 
WHERE timestamp > now()-300
GROUP BY 
    time, 
    appId

Then we created distributed table based on MVs

CREATE TABLE IF NOT EXISTS  default.MetricStatsDistributed ON CLUSTER metrics AS default.MetricStats
ENGINE = Distributed(metrics, default, MetricStats, rand());

After all we select data from Distributed table (which based on MVs)

SELECT 
    uniqExactMerge(`os.host`) AS `os.host`, 
    uniqExactMerge(`container.id`) AS `container.id`, 
    uniqExactMerge(`container.host.hostname`) AS `container.host.hostname`, 
    appId, 
    time
FROM MetricStatsDistributed 
WHERE appId = XXX
GROUP BY 
    appId, 
    time
ORDER BY time ASC

And got following

┌─os.host─┬─container.id─┬─container.host.hostname─┬─appId─┬──────────time─┐
│      35 │          486 │                       1 │ XXX   │ 1546932480000 │
│      35 │          486 │                       1 │ XXX   │ 1546932540000 │
│      35 │          486 │                       1 │ XXX   │ 1546932600000 │
│      35 │          486 │                       1 │ XXX   │ 1546932660000 │
│      35 │          486 │                       1 │ XXX   │ 1546932720000 │
│      35 │          480 │                       1 │ XXX   │ 1546932780000 │ >>> at that time we created MVs
│      29 │          287 │                       1 │ XXX   │ 1546932840000 │
│      27 │          376 │                       1 │ XXX   │ 1546932900000 │
│      27 │          330 │                       1 │ XXX   │ 1546932960000 │
│      21 │          309 │                       1 │ XXX   │ 1546933020000 │
└─────────┴──────────────┴─────────────────────────┴───────┴───────────────┘

35 are right values. 29, 27, 21 values are wrong and it means that we missed some unique values.
As you can see we see wrong values only after MVs were created, so during POPULATE we agg data right, But then during realtime insertion/aggregating we started to see issue.

We have a feeling that maybe not everything was added to MVs and aggregated with uniqExactState, or maybe aggregation into MVs triggers only once for (time, appId) pair (order key) and it something was added to "main" table later will be missed?

Thx for future answer&investigation.

question

Source

wawanawna

👍1

All 12 comments

WHERE timestamp > now()-300

Your MV missing everything with timestamp older than 5 min. during insert.

den-crane on 11 Jan 2019

Yes, we know, this is not a case here. We insert only last minute data usually
I even tried

WHERE timestamp > now()-1800

No effect, so cause is somewhere else

wawanawna on 11 Jan 2019

No effect, so cause is somewhere else

Just to be sure - can you try to remove that condition? Anyway you "insert only last minute data".

Why: maybe now() is beeing calculated once - at the moment of MV initialization?

filimonov on 11 Jan 2019

Try to calculate MV countState or sum(1) to locate the issue (missing rows or uniqExact)

den-crane on 11 Jan 2019

No effect, so cause is somewhere else

Just to be sure - can you try to remove that condition? Anyway you "insert only last minute data".

Why: maybe now() is beeing calculated once - at the moment of MV initialization?

Hello, thx for suggestion, I've tried create MV without any conditions, didn't help neither

wawanawna on 15 Jan 2019

Try to calculate MV countState or sum(1) to locate the issue (missing rows or uniqExact)

Thx for suggestion, I've tried countState and got this

│ 13629 │         78100 │              78100 │                         78100 │ 1547548020000 │
│ 13629 │         78014 │              78014 │                         78014 │ 1547548080000 │
│ 13629 │         77851 │              77851 │                         77851 │ 1547548140000 │
│ 13629 │         78054 │              78054 │                         78054 │ 1547548200000 │
│ 13629 │         50058 │              50058 │                         50058 │ 1547548260000 │  >>>> HERE MV WAS CREATED
│ 13629 │         39404 │              39404 │                         39404 │ 1547548320000 │
│ 13629 │         34681 │              34681 │                         34681 │ 1547548380000 │
│ 13629 │         26377 │              26377 │                         26377 │ 1547548440000 │
│ 13629 │         34462 │              34462 │                         34462 │ 1547548500000 │
│ 13629 │         33806 │              33806 │                         33806 │ 1547548560000 │
│ 13629 │         11035 │              11035 │                         11035 │ 1547548620000 │
│ 13629 │         28857 │              28857 │                         28857 │ 1547548680000 │

As you can see count also showed incorrect values. So problem is in missing rows it seems.

wawanawna on 15 Jan 2019

@wawanawna Did you find the reason?

filimonov on 6 May 2019

No, we didn't find a reason @filimonov
We stopped using MV, found workaround.

wawanawna on 7 May 2019

Hi @wawanawna,
What is the workaround you found ?

nupursinghbaghel on 16 May 2019

@nupursinghbaghel it could be https://github.com/yandex/ClickHouse/issues/5274
Then workaround is to add some column to MV to break de-duplication
as select , now() as workaround -- or generateUUIDv4()
and collapse it using any

den-crane on 16 May 2019

Hi @wawanawna,
What is the workaround you found ?

We just stopped using MVs @nupursinghbaghel

wawanawna on 20 May 2019

👎2

We using MV that selecs data from replicated table:

CREATE TABLE db.`replicated_table` (…) ENGINE = ReplicatedMergeTree(…, '{replica}') ORDER BY (…) PARTITION BY … SAMPLE BY CAST(…);

Table is replicated across 3 machines, only 1 machine handles INSERT queries.

We've created MV on "slave" replica, and a lot of documents are missing on it.

CREATE MATERIALIZED VIEW db2.`mv` ENGINE = MergeTree() ORDER BY (…) PARTITION BY … POPULATE AS SELECT … FROM db.`replicated_table`;

or same behaviour with ENGINE=ReplicatedMergeTree

CREATE MATERIALIZED VIEW db2.`mv` ENGINE = ReplicatedMergeTree(…, '{replica}') ORDER BY (…) PARTITION BY … POPULATE AS SELECT … FROM db.`replicated_table`;

Fixed by creating MV with ENGINE=ReplicatedMergeTree on each insertable replica. Now all works fine.