CREATE TABLE IF NOT EXISTS insert_view(...) ENGINE = Null;
CREATE MATERIALIZED VIEW data_local ENGINE = AggregatingMergeTree(..., sumState(num1) as num1,sumState(num2) as num2,sumState(num3) as num3,minState(num4) as num4,maxState(num5) as num5,sumState(num6) as num6 FROM insert_view GROUP BY xxxx;
CREATE TABLE data as data_local ENGINE = Distributed(perftest_2shards_1replicas, default, data_local, rand());
But all record insert in a shard?
so, how to use Distributed with MaterializedView
1) Suppose that you want to store pairs of (user, user_visit_time) in your distributed db
2) You have many local (or replicated) tables on each server with such data
3) The data is updated periodically, you insert several thousands pairs each n minutes
4) In addition to your data stored in many local tables, you want to store some metadata (or statistics) for each portion of data inserted in local tables. For example, you want to store number of uniq users for each minute.
5) To do so, on each server you create local_stat tables that are MaterializedView tables: (approx.) CREATE MATERIALIZED VIEW local_stat (createDate DateTime, uniq_users AggregateFunction(uniq)) AS SELECT now() AS createDate, uniqState(user) AS uniq_users FROM local GROUP BY toRelativeMinuteNum(user_visit_time)
6) Now on each server you have local tables with main data and local_stat MaterializedView tables with auxiliary statistics
7) Each INSERT into local makes corresponding INSERT SELECT into local_stat
8) Now you want to wrap many local tables (local and local_stat) into convenient Distributed tables: local_all and local_stat_all (and create such wrapper on each node).
9) Now each INSERT into local_all is transformed in many local INSERTs into local tables. Each local INSERT activate INSERT SELECT for each "attached" MaterializedView table (i.e. for local_stat).
10) After these chain of INSERTs are finished, you could SELECT results via Distrubuted tables local and local_stat
So, in your case you should create addition Distributed table for insert_view and send INSERTs into it.
@ludv1x but it's not work...
What doesn't precisely work?
Could you provide your configuration and queries?
config
<yandex>
<clickhouse_remote_servers>
<perftest_2shards_1replicas>
<shard>
<replica>
<host>localtest.clickhouse.shard1</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>localtest.clickhouse.shard2</host>
<port>9000</port>
</replica>
</shard>
</perftest_2shards_1replicas>
</clickhouse_remote_servers>
<zookeeper-servers>
<node>
<host>10.1.1.153</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<replica>10.1.1.154</replica>
<shard>01</shard>
</macros>
</yandex>
and sql
CREATE TABLE IF NOT EXISTS insert_view_local(metricId Int64, applicationId Int64, agentRunId Int64, num1 Float64, num2 Float64, tc_startDate Date, tc_startTime UInt64) ENGINE = Null;
CREATE TABLE insert_view as insert_view_local ENGINE = Distributed(perftest_2shards_1replicas, default, insert_view_local, rand());
CREATE MATERIALIZED VIEW metric_data_entity_pt1h ENGINE = AggregatingMergeTree(tc_startDate,(tc_startTime, applicationId, metricId, agentRunId), 8192) AS SELECT tc_startDate, tc_startTime, applicationId, metricId, agentRunId, sumState(num1) as num1,sumState(num2) as num2 FROM insert_view GROUP BY tc_startDate,tc_startTime,applicationId, metricId, agentRunId;
i use insert into insert_view values(1, 10, 0, 0.4, 0.7, toDate('2017-02-27'), 1488178550000)
all the data is on the machine where the insertion statement is executed...
Materialized View over Distributed table don't distribute insertions among the cluster.
Only insertions into default.insert_view_local will be distributed.
You need create Materialized View over insert_view_local (not over insert_view) on each server.
so, The final sql statement is as follows:
CREATE TABLE IF NOT EXISTS insert_view_local(metricId Int64, applicationId Int64, agentRunId Int64, num1 Float64, num2 Float64, tc_startDate Date, tc_startTime UInt64) ENGINE = Null;
CREATE TABLE insert_view as insert_view_local ENGINE = Distributed(perftest_2shards_1replicas, default, insert_view_local, rand());
CREATE MATERIALIZED VIEW metric_data_entity_pt1h_local ENGINE = AggregatingMergeTree(tc_startDate,(tc_startTime, applicationId, metricId, agentRunId), 8192) AS SELECT tc_startDate, tc_startTime, applicationId, metricId, agentRunId, sumState(num1) as num1,sumState(num2) as num2 FROM insert_view_local GROUP BY tc_startDate,tc_startTime,applicationId, metricId, agentRunId;
CREATE TABLE metric_data_entity_pt1h as metric_data_entity_pt1h_local ENGINE = Distributed(perftest_2shards_1replicas, default, metric_data_entity_pt1h_local, rand());
insert into insert_view and query select some_column from metric_data_entity_pt1h ?
it works.
I highly appreciate your help, thanks.
What if many replicas in one shard? Can I change the ENGINE of materialized view to ReplicatedMergeTree?
asdad
Most helpful comment
Materialized View over Distributed table don't distribute insertions among the cluster.
Only insertions into
default.insert_view_localwill be distributed.You need create Materialized View over
insert_view_local(not overinsert_view) on each server.