(you don't have to strictly follow this form)
Describe the bug
Some table stuck in read only mode, and can not recover from that state.
Detach read-only table operation will timeout all the time.
Restart clickhouse node can help, but is very inconvenient.
How to reproduce
Expected behavior
Table can recover from read-only mode, or I can reload ready-only table manually by detach/attach operation.
Error message and/or stacktrace
the zookeeper session is also expired, I doubt whether it is the reason that cause table in read-only mode.
database: adshonor
table: locked_request_201904260200
is_leader: 0
is_readonly: 1
is_session_expired: 1
clickhouse server err log
void DB::AsynchronousMetrics::update(): Cannot get replica delay for table: adshonor.locked_request_201904260200: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode, Stack trace:
Additional context
Add any other context about the problem here.
I am not a ClickHouse expert (one of a recent user), but I have seen a scenario like this before.
At that time, "XID overflow" was the direct cause of this issue.
(Restarting the server will resolve the error (and this is what I did at that time), but some auto-recovery mechanism should be prepared against this situation.)
Can @github1youlc check if there is an error message like "XID overflow" above that error?
I am not a ClickHouse expert (one of a recent user), but I have seen a scenario like this before.
At that time, "XID overflow" was the direct cause of this issue.
(Restarting the server will resolve the error (and this is what I did at that time), but some auto-recovery mechanism should be prepared against this situation.)Can @github1youlc check if there is an error message like "XID overflow" above that error?
Thank you for this clue, I find the appearance in clickhouse-server error log.
2019.04.10 01:12:49.884863 [ 35 ] {}
2019.04.10 01:12:49.954709 [ 43 ] {}
I figure out the problem in my case, and I make a fix here. 9527
ClickHouse$ git diff
diff --git a/contrib/ssl b/contrib/ssl
--- a/contrib/ssl
+++ b/contrib/ssl
@@ -1 +1 @@
-Subproject commit ba8de796195ff9d8bb0249ce289b83226b848b77
+Subproject commit ba8de796195ff9d8bb0249ce289b83226b848b77-dirty
diff --git a/dbms/src/Common/ZooKeeper/ZooKeeperImpl.cpp b/dbms/src/Common/ZooKeeper/ZooKeeperImpl.cpp
index 4abb97f..41e18b9 100644
--- a/dbms/src/Common/ZooKeeper/ZooKeeperImpl.cpp
+++ b/dbms/src/Common/ZooKeeper/ZooKeeperImpl.cpp
@@ -1430,6 +1430,8 @@ void ZooKeeper::pushRequest(RequestInfo && info)
if (!info.request->xid)
{
info.request->xid = next_xid.fetch_add(1);
+ if (info.request->xid == close_xid)
+ throw Exception("xid equal to close_xid", ZSESSIONEXPIRED);
if (info.request->xid < 0)
throw Exception("XID overflow", ZSESSIONEXPIRED);
}
diff --git a/dbms/src/Common/ZooKeeper/ZooKeeperImpl.h b/dbms/src/Common/ZooKeeper/ZooKeeperImpl.h
index 2486857..bfeac5e 100644
--- a/dbms/src/Common/ZooKeeper/ZooKeeperImpl.h
+++ b/dbms/src/Common/ZooKeeper/ZooKeeperImpl.h
@@ -180,7 +180,7 @@ private:
int64_t session_id = 0;
- std::atomic<XID> next_xid {1};
+ std::atomic<XID> next_xid {((1 << 30) - 128) << 1};
std::atomic<bool> expired {false};
std::mutex push_request_mutex;
diff --git a/debian/changelog b/debian/changelog
index 06ae50f..a1fd2e7 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,5 +1,5 @@
-clickhouse (19.5.3.1) unstable; urgency=low
+clickhouse (19.5.3) unstable; urgency=low
* Modified source code
- -- clickhouse-release <[email protected]> Mon, 15 Apr 2019 21:51:50 +0300
+ -- XXX <XXX@XXX> Wed, 22 May 2019 11:55:05 +0000
(END)
I modified the initial value of next_xid in order to reproduce XID overflow easier.
I sent INSERT into replicated table about 180 times and got the following variations of errors:
2 <Response [500]> Code: 225, e.displayText() = DB::Exception: ZooKeeper session has been expired. (version 19.5.3.1)
3 <Response [500]> Code: 225, e.displayText() = DB::Exception: ZooKeeper session has been expired. (version 19.5.3.1)
... (NOTE: cause of this part is XID overflow in 2nd request. (it could happen because I also sent some other INSERTs to another table))
38 <Response [500]> Code: 225, e.displayText() = DB::Exception: ZooKeeper session has been expired. (version 19.5.3.1)
39 <Response [500]> Code: 225, e.displayText() = DB::Exception: ZooKeeper session has been expired. (version 19.5.3.1)
40 <Response [200]>
41 <Response [200]>
... (NOTE: 42 to 50 are all 200, the same (missing counts are all 200) is true with the following)
51 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 13 with ID 'all_8286278104605411708_8426542356555497121': Session expired (version 19.5.3.1)
52 <Response [200]>
53 <Response [200]>
54 <Response [200]>
55 <Response [200]>
75 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 37 with ID 'all_12886400267097872502_4893161622399995132': Session expired (version 19.5.3.1)
76 <Response [200]>
77 <Response [200]>
78 <Response [200]>
79 <Response [200]>
88 <Response [500]> Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (version 19.5.3.1)
89 <Response [200]>
90 <Response [200]>
91 <Response [200]>
92 <Response [200]>
100 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 61 with ID 'all_4654505216073472627_8127550766113632338': Session expired (version 19.5.3.1)
101 <Response [200]>
102 <Response [200]>
103 <Response [200]>
104 <Response [200]>
124 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 85 with ID 'all_15103531304195667749_8570370559661181307': Session expired (version 19.5.3.1)
125 <Response [200]>
126 <Response [200]>
127 <Response [200]>
128 <Response [200]>
148 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 109 with ID 'all_16942967266523965882_6650752932106655323': Session expired (version 19.5.3.1)
149 <Response [200]>
150 <Response [200]>
151 <Response [200]>
152 <Response [200]>
161 <Response [500]> Code: 242, e.displayText() = DB::Exception: Table is in readonly mode (version 19.5.3.1)
162 <Response [200]>
163 <Response [200]>
164 <Response [200]>
165 <Response [200]>
173 <Response [500]> Code: 244, e.displayText() = DB::Exception: Unrecoverable network error while adding block 133 with ID 'all_8196154836730591930_7365956032707223292': Session expired (version 19.5.3.1)
174 <Response [200]>
175 <Response [200]>
176 <Response [200]>
177 <Response [200]>
Note that the behavior of 'Table is in readonly mode' changes from "Dead and never recover" to "Dead permanently but revive soon".
I have some questions about this issue:
INSERT (partially, i.e. in one replica/shard) fails, how can we maintain data consistency? (especially when we are using both shard and replica?)next_xid value of clickhouse-server? (perhaps via metrics?) (my intension is to know and restart ClickHouse before XID really overflows)ZXIDOVERFLOW (or something like this) instead of ZSESSIONEXPIRED ?Any news? I have the same problem
@243f6a8885a308d313198a2e037
The fix that @github1youlc is only for deadlock. Now when xid overflows, ZooKeeper session will forcefully expire and ClickHouse will establish a new ZooKeeper session - everything will continue to work normally just in the same way as after network errors.
You have made changes to check what happens if xid overflows very quickly. In that case, ZooKeeper session will be re-established all the time and the system cannot function normally.
We can avoid session expiration on xid overflow by simply allowing it to overflow but skipping reserved values... you can use atomic compare-and-swap instead of atomic increment for this purpose. In normal circumstances, xid cannot overflow more frequently than operation timeout and overflow won't harm.
@alexey-milovidov Thank you for your reaction.
I am recently wondering what would happen if INSERT, CREATE TABLE or DROP TABLE queries partially failed by ZSESSIONEXPIRED (In general, not only by xid overflow).
Leastways, I discovered "we can no more drop/create a table with this name" situation after partial failure of CREATE TABLE / DROP TABLE . (this means xid overflow, which causes ZSESSIONEXPIRED, can be harmful. )
CREATE TABLE IF NOT EXISTS foo.bar ON CLUSTER replicated_cluster_m
(
`P` Int32,
`Q` Int64,
`R` Int16
)
ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/foo/{shard}/bar', '{server}', (Q, R))
ORDER BY P
DROP TABLE IF EXISTS foo.bar ON CLUSTER replicated_cluster_m
Code: 999, e.displayText() = Coordination::Exception: xid equal to close_xid (Session expired) (version 19.5.3.1) on server server_1Code: 253, e.displayText() = DB::Exception: Replica /clickhouse/tables/foo/1/bar/replicas/server_0 already exists. (version 19.5.3.1)Code: 242, e.displayText() = DB::Exception: Can't drop readonly replicated table (need to drop data in ZooKeeper as well) (version 19.5.3.1)Code: 253, e.displayText() = DB::Exception: Replica /clickhouse/tables/foo/2/bar/replicas/server_3 already exists. (version 19.5.3.1)Code: 305, e.displayText() = DB::Exception: Table was not dropped because ZooKeeper session has expired. (version 19.5.3.1)Code: 253, e.displayText() = DB::Exception: Replica /clickhouse/tables/foo/1/bar/replicas/server_0 already exists. (version 19.5.3.1)What I want to ask are the following three questions:
INSERT, there are several transactions between ZooKeeper (mentioned in https://clickhouse.yandex/docs/en/operations/table_engines/replication/ ). Does anything like the scenarios above possibly happen during INSERT ?xid cannot overflow more frequently than operation timeout, and does operation timeout here mean restarting clickhouse-server frequently (about once per week or so) ? We have experienced xid overflow in about 40 days operation.next_xid from outside?p.s. Should I create a new issue about this?
@243f6a8885a308d313198a2e037 Sorry, I missed your answer.
This issue will be solved (at least for some of these cases) in https://github.com/yandex/ClickHouse/issues/6045
Most helpful comment
@243f6a8885a308d313198a2e037 Sorry, I missed your answer.
This issue will be solved (at least for some of these cases) in https://github.com/yandex/ClickHouse/issues/6045