Clickhouse: Data loss in materialized views with ZooKeeper errors

Created on 2 Nov 2018  Â·  14Comments  Â·  Source: ClickHouse/ClickHouse

Looks like fixes "ZK Session expired" in #2939 #2949 #2964 are not enough.

Found discrepancy between original table and view. Error logs on both replicas for this view for affected period below.

replica1

2018.11.02 01:20:36.168545 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.168627 [ 33 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.168805 [ 32 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.168844 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.168864 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.332660 [ 289 ] {b0eb3d92-4cc9-4822-b429-059b154b4baf} <Error> executeQuery: Code: 999, e.displayText() = DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp/abandonable_lock-: while pushing to view SrcViews.TTViewAdv, e.what() = DB::Exception (from 10.247.128.36:39528) (in query: INSERT INTO SrcData.TTStatBase (impression_id, impression_id_sequence, ts, request_uri, referer, ip, useragent, publisher_id, site_id, section_id, size, media_type, bid_response_time) FORMAT TabSeparated), Stack trace:
2018.11.02 01:37:38.884870 [ 37 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:37:38.885075 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:37:38.885269 [ 27 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:37:38.885325 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:43.419279 [ 25 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:43.419841 [ 36 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:43.419904 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:43.420603 [ 23 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:48:26.281106 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:48:26.281121 [ 32 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:48:26.281824 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:48:26.282121 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:48:26.282173 [ 23 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:

replica2

2018.11.02 01:04:32.720183 [ 23 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:32.720851 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:32.720872 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:32.720901 [ 27 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:48.797412 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:48.798043 [ 34 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:48.798365 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:48.799071 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:48.836748 [ 35 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/log, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.779100 [ 23254 ] {} <Error> void Coordination::ZooKeeper::receiveThread(): Code: 999, e.displayText() = Coordination::Exception: Operation timeout (deadline already expired) for path: /clickhouse/tables/3/TTViewAdv/replicas/ch3r1.local/parts/20181102_20181102_20732_20732_0/columns (Operation timeout), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.779577 [ 7 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/replicas/ch3r1.local/parts/20181102_20181102_20732_20732_0/columns, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.779627 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.779642 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/log, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.780487 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.780629 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:38.780708 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:42:45.098473 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:42:45.098812 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:42:45.098986 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:42:45.099182 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:59:36.288489 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:59:36.289070 [ 37 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:59:36.289208 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:59:36.289863 [ 32 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:

Second shard with data loss (occasionally for same hour):

replica1

2018.11.02 01:00:48.294953 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:00:48.295402 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:00:48.295411 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:00:48.295465 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:27.744315 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:27.770959 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:27.771003 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:04:27.771010 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:51.847150 [ 29 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:51.847198 [ 27 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:51.847454 [ 33 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:40:51.847727 [ 23 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:

Replica2

2018.11.02 01:03:32.655874 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:03:32.656314 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:03:32.656367 [ 34 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:03:32.656724 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:20:36.355478 [ 54 ] {} <Error> TTStatBase.Distributed.DirectoryMonitor: Code: 999, e.displayText() = DB::Exception: Received from ch3r1.local:9000, 10.247.129.212. DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss, path: /clickhouse/tables/3/TTViewAdv/temp/abandonable_lock-: while pushing to view SrcViews.TTViewAdv. Stack trace:
2018.11.02 01:23:26.223906 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:26.223974 [ 9 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/replicas, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:26.224788 [ 25 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:26.224846 [ 23 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:23:26.224851 [ 33 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:59.472526 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/6/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:59.538437 [ 36 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:59.538655 [ 21 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:59.538671 [ 27 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 01:41:59.539001 [ 30 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:

Shard 3 for another data loss hour:

Replica 1

2018.11.02 09:15:29.133107 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/5/TTViewAdv/log, e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:15:29.135605 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:15:29.135623 [ 33 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:15:29.135661 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:

Replica 2

2018.11.02 09:03:37.626979 [ 26 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/5/TTViewAdv/mutations, e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:03:37.627142 [ 34 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/5/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:03:37.627381 [ 33 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:03:37.627643 [ 36 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:03:37.627681 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:03:37.628210 [ 31 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:35:40.984643 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/5/TTViewAdv/temp, e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:35:40.985395 [ 34 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeCleanupThread): void DB::ReplicatedMergeTreeCleanupThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:35:40.985465 [ 28 ] {} <Error> SrcViews..inner.TTViewAdv (ReplicatedMergeTreeAlterThread): void DB::ReplicatedMergeTreeAlterThread::run(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:35:40.985557 [ 24 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
2018.11.02 09:35:40.985608 [ 22 ] {} <Error> SrcViews..inner.TTViewAdv (StorageReplicatedMergeTree): void DB::StorageReplicatedMergeTree::mutationsUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), e.what() = Coordination::Exception, Stack trace:
comp-replication comp-zookeeper obsolete-version stale

Most helpful comment

Our team currently issues the same problem with the data loss after zookeeper errors while transporting data into clickhouse via kafka

All 14 comments

ClickHouse server version 18.14.11 revision 54409.

Seeing similar, v18.14.15, Zookeeper 3.4.13, 4 nodes, 4 inserters round robin:

Code: 999. DB::Exception: Received from (hostname):9000, 151.139.73.213. DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss, path: /clickhouse/tables/4/(database)/block_numbers/1538352000/block-.

Need to add some details.

4 shards, 2 replicas, insert to Distributed table on round-robin servers, ReplicatedMergeTree as base storage, about a 6 materialized views - ReplicatedAggregatedMergeTree.

Data sharded about an equally per shard. About 15 original inserts per minute to cluster - and it's turns to significantly more real block inserts because of data distribution for any insert and views.

Our team currently issues the same problem with the data loss after zookeeper errors while transporting data into clickhouse via kafka

I have same problem with missing data in materialized view table on each node.
Not sure about zookeeper errors, but finally have missing data in replicated MV tables.
CH version 18.14.19.

Anybody tried impact of setting _insert_distributed_sync = 1_
on this issue with materialized vies ?

Since MV works as triggers (processing asynchronously after initial query) I think there will be no any difference.

In my case all issues gone after ZK tuning to avoid huge ZK session timeouts.

I have simple structure very similar to yours, 4 shards, 2 replicas, cluster with 4 + 4 nodes (for replicas)
ReplicatedMergeTree for master table and for underlying table in materialized views.
Materialized tables are just for different indexes (order by) and should have same data as master.
For all those tables have Distributed tables to be able to read clustered data.

After inserting data into master table, some of data are not inserted in materialized indexes tables, which makes this solution unusable.

Any hints where to look at, or how to avoid this issues ?

Are MV made as Replicated*MergeTree engine? If not, it won't work properly, since MV will be applied only for data came from direct inserts, but not for replicated data.
If yes - check error log for errors in MV inner table. I hope It'll be about ZooKeeper, and next step should be ZK tuning according CH docs recommendations - https://clickhouse.yandex/docs/ru/operations/tips/

Thanks for feedback.

Master table is sharded on 4 nodes and replicated on another 4 nodes.
MVs are created using in advance created table using syntax CREATE MATERIALIZED VIEW xx TO yy.
Node underlying tables for MVs are replicated to another counterpart node on replica server.

So this is something standard replicated model on cluster with 8 nodes, 4 are primary and 4 are replicas.

Inserting data on primary nodes though distributed table for master table, didn't insert correctly data in MVs even on primary nodes. So the replicas are also affected (no replication of nothing).

Will try without replication for MV tables, but this is really bad thing.
I haven't problem with replication for normal tables without MVs.
Also I will setup new zookeeper cluster according to recommended setting to eliminate that part of possible issues with miss-configuration.

Zookeeper behaves bad. Maybe it's installed on same machines as ClickHouse? (it's bad practice)

Since MV works as triggers (processing asynchronously after initial query) I think there will be no any difference.

MV is executed synchronously, it's Distributed to ReplicatedMergeTree insert work asynchronously. The flag insert_distributed_sync changes that behaviour.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Hello. We started see the same exception

`DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss (version 19.11.11.57 (official build))

at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:824) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:817) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:335) ~[clickhouse-jdbc-0.2.jar:?]
at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.executeBatch(ClickHousePreparedStatementImpl.java:320) ~[clickhouse-jdbc-0.2.jar:?]`

Any plans probably to fix this?

PS: thx for a great DB we are using on prod last 2 years

@wawanawna This exception means that connection to ZooKeeper was lost and ClickHouse will reconnect to ZooKeeper. This behaviour is normal and doesn't lead to data loss.

Was this page helpful?
0 / 5 - 0 ratings