Hello,
I'm not sure if this a bug or not but based on documentation: "INSERT and ALTER are replicated"
Having this in mind I'm expecting the following scenario is working:
CREATE TABLE db_1_testing.collection_four_src(`ck.partition` Date,`ck.created_at` DateTime,`ck.timestamp` DateTime, `transaction_id` Nullable(String),`price` Nullable(Float64),`name` Nullable(String),`array` Nullable(Array(String))) ENGINE = ReplicatedMergeTree('/var/lib/clickhouse/tables/{shard}/collection_four_src','{replica}',`ck.partition`,`ck.timestamp`,8192);
CREATE TABLE db_1_testing.collection_four AS db_1_testing.collection_four_src ENGINE = Distributed(cluster_test,db_1_testing,collection_four_src,rand());
Then when I execute:
ALTER TABLE db_1_testing.collection_four MODIFY COLUMN `array` Nullable(Array(Int64));
I receive 200 OK but nothing happened. The column is still Nullable(Array(String)) not Nullable(Array(Int64))
If I execute the same on each collection_four_src table on each node works.
Silviu
Behavior it's pretty odd...
Basically I create a table with a column which is Nullable(String) then I alter the table to change the type of that column in Nullable(Int64). (look at validation_date in the following example)
CREATE TABLE project_1_1.ivr_src(`ck.partition` Date,`ck.created_at` DateTime,`app_id` Nullable(Int64),`charge_cost` Nullable(Float64),`charge_rate` Nullable(Float64),`ck.timestamp` DateTime,`country_code` Nullable(String),`delivery_cost` Nullable(Float64),`delivery_rate` Nullable(Float64),`duration_billed` Nullable(Int64),`from` Nullable(String),`provider_id` Nullable(Int64),`repeated` Nullable(UInt8),`to` Nullable(String),`user_id` Nullable(Int64),`validated` Nullable(UInt8),`validation_date` Nullable(String)) ENGINE = ReplicatedMergeTree('/var/lib/clickhouse/tables/{shard}/ivr_src','{replica}',`ck.partition`,`ck.timestamp`,8192);
CREATE TABLE project_1_1.ivr AS project_1_1.ivr_src ENGINE = Distributed(cluster_test,project_1_1,ivr_src,rand())
ALTER TABLE project_1_1.ivr_src MODIFY COLUMN `validation_date` Nullable(Int64)
If I insert values via the distributed table is not working: (Code: 53, e.displayText() = DB::Exception: Type mismatch in IN or VALUES section. Expected: String. Got: UInt64, e.what() = DB::Exception\n)
If I do via the local table is working.
Working:
INSERT INTO project_1_1.ivr_src(`validation_date`,`validated`,`user_id`,`to`,`repeated`,`provider_id`,`from`,`duration_billed`,`delivery_rate`,`delivery_cost`,`country_code`,`ck.timestamp`,`charge_rate`,`charge_cost`,`app_id`,`ck.created_at`,`ck.partition`) VALUES (1491000849,1,14226201,'557799738',1,675,'157783225',1,0.262,0.262,'LB',1491000845,0.3406,0.3406,14226201,1498769440,17256);
Not working:
INSERT INTO project_1_1.ivr(`validation_date`,`validated`,`user_id`,`to`,`repeated`,`provider_id`,`from`,`duration_billed`,`delivery_rate`,`delivery_cost`,`country_code`,`ck.timestamp`,`charge_rate`,`charge_cost`,`app_id`,`ck.created_at`,`ck.partition`) VALUES (1491000849,1,14226201,'557799738',1,675,'157783225',1,0.262,0.262,'LB',1491000845,0.3406,0.3406,14226201,1498769440,17256);
Yes, it is designed behavior.
I'm not sure if this a bug or not but based on documentation: "INSERT and ALTER are replicated"
It is related to Replicated* tables, not to Distributed.
From the doc:
The Distributed engine by itself does not store data, but allows distributed query processing on multiple servers.
Distributed table is just proxies INSERT and SELECT queries.
Since it does not store data it is logically that it cannot modify underlying data.
Moreover, it can proxies different kind of tables simultaneously (even other proxy tables such as Merge and Distributed).
It makes cascading altering of distributed table is too complicated.
But to make ALTERs on cluster easier you could try new feature called "distributed DDL".
So, you can write ALTER queries like
ALTER TABLE merge ON CLUSTER cluster ADD COLUMN s DEFAULT toString(i)
it will be eventually executed on each instance (even if someone instance not available now).
Currently Distributed DDL doesn't allow to make ALTERs of Replicated tables, but you can avoid this using distributed_ddl_allow_replicated_alter=1 setting (more technical details why it happens).
Hello thanks,
Very useful comment. I'll try the new feature "distributed DDL" but still you are telling that "Distributed table is just proxies INSERT and SELECT queries." if it's like this then why in order to alter a replicated table I need to:
See my second comment did few days ago. In case distributed table was only a view I was expecting to be able to do queries on the new column once I changed the replicated tables on all nodes.
Maybe something is not clear enough for me
Silviu
Distributed table is just a view, but it has it's own table structure (list of columns),
that must match table structure of remote tables, or at least have a subset of columns.
This looks redundant, but it is much more simple in implementation: every table has it's own list of columns.
That's why you have to alter both remote tables and distributed table.
@alexey-milovidov so means that in order to alter a table you need to :
I'm correct ? seems a bit redundant..
This is correct.
It's typical to have Distributed table on every node on cluster.
In that case, you just run second ALTER TABLE y ON CLUSTER.
Ah ok got it. Might be better to add this in documentation as well :) I'm sure it's confusing for other peoples as well.
↘ Progress: 538.05 million rows, 4.84 GB (2.46 million rows/s., 22.15 MB/s.) Received exception from server:
Code: 240. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Allocator: Cannot mremap., errno: 12, strerror: Cannot allocate memory.
can anyone help me resolve this seems creepy..?
@alexey-milovidov
I have an error when I try to alter distributed table that points to ReplicatedMergeTree on cluster
CREATE TABLE log.test ON CLUSTER 'my-cluster' (date Date, value1 String) ENGINE = ReplicatedMergeTree ('/clickhouse/tables/{shard}/log/test','{replica}', date, (date), 8192)
CREATE TABLE log.test_d ON CLUSTER 'my-cluster' AS log.test ENGINE = Distributed('my-cluster', log, test, rand())
ALTER TABLE log.test ON CLUSTER 'my-cluster' ADD COLUMN value2 String
ALTER TABLE log.test_d ON CLUSTER 'my-cluster' ADD COLUMN value2 String
The last alter throws an error: An error occured before execution: Code: 371, e.displayText() = DB::Exception: Table test_d isn\'t replicated, but shard #2 is replicated according to its cluster definition, e.what() = DB::Exception for each node
Am I missing something?
@dolbyzerr I've got the same error, one guy in the CH Telegram channel told me that ALTER TABLE ON CLUSTER is not yet supported for Distributed tables. It looks for me that recreating Distributed table is the easiest way in cases when you need to alter it.
I came across this problem in clickhouse 18.1.6 version. I got the query log in every nodes of the cluster when Alter a distributed table 'test'. For example, DDLWorker: Processing task query-0000000005 (ALTER TABLE default.test ON CLUSTER replicated_stock ADD COLUMN nc String ), and it can be executed on every node successfully, but failed with the exception "able test isn't replicated, but shard #3 is replicated according to its cluster definition, e.what() = DB::Exception". It almost succeeds from my view, and the code
if (!execute_once_on_replica && config_is_replicated_shard)
{
throw Exception("Table " + ast_alter->table + " isn't replicated, but shard #" + toString(task.host_shard_num + 1) +
" is replicated according to its cluster definition", ErrorCodes::INCONSISTENT_CLUSTER_DEFINITION);
}
makes me confused. If we can alter the query manually on each node, why does the server give a try?
Correct me if I am wrong, based on the above comments I summarize the current status as of 2019.12.18
1) [x] 3-node zookeeper + 3-node cluster
config = 2 replicas, 0 shards
config =
2) [x] CREATE/DROP ReplicatedMergeTree ON CLUSTER mycluster
working ok
3) [ ] ALTER TABLE ON CLUSTER mycluster
fails with error (designed to fail, cannot be fix):
DB::Exception: There was an error on [clickhouse-2.clickhouse-headless.clickhouse.svc.cluster.local:9000]: An error occured before execution: Code: 371, e.displayText() = DB::Exception: Table '[REDACTED]' is replicated, but shard #3 isn't replicated according to its cluster definition. Possibly
Most helpful comment
Yes, it is designed behavior.
It is related to
Replicated*tables, not toDistributed.From the doc:
Distributed table is just proxies INSERT and SELECT queries.
Since it does not store data it is logically that it cannot modify underlying data.
Moreover, it can proxies different kind of tables simultaneously (even other proxy tables such as
MergeandDistributed).It makes cascading altering of distributed table is too complicated.
But to make ALTERs on cluster easier you could try new feature called "distributed DDL".
So, you can write ALTER queries like
it will be eventually executed on each instance (even if someone instance not available now).
Currently Distributed DDL doesn't allow to make ALTERs of Replicated tables, but you can avoid this using
distributed_ddl_allow_replicated_alter=1setting (more technical details why it happens).