Is your feature request related to a problem? Please describe:
Currently connection ids are local to TiDB servers, which means that a KILL x must be directed to the correct server, and can not safely be load balanced across the cluster.
This is annoying because it creates affinity to servers in a distributed architecture. It also doesn't retrofit to MySQL so well, since the KILL x syntax can't be supported (using KILL TIDB x to prevent incorrect usage).
Describe the feature you'd like:
I found this note while reading the source code, and it makes a lot of sense to me. I decided to create this FR for discussion and tracking since I couldn't find an existing one open.
parser/ast/misc.go:
// TiDBExtension is used to indicate whether the user knows he is sending kill statement to the right tidb-server.
// When the SQL grammar is "KILL TIDB [CONNECTION | QUERY] connectionID", TiDBExtension will be set.
// It's a special grammar extension in TiDB. This extension exists because, when the connection is:
// client -> LVS proxy -> TiDB, and type Ctrl+C in client, the following action will be executed:
// new a connection; kill xxx;
// kill command may send to the wrong TiDB, because the exists of LVS proxy, and kill the wrong session.
// So, "KILL TIDB" grammar is introduced, and it REQUIRES DIRECT client -> TiDB TOPOLOGY.
// TODO: The standard KILL grammar will be supported once we have global connectionID.
Since I believe the connectionID must be 32-bits for protocol compatibility, I assume that the implementation will use the first 8-16 bits for the host, and the remaining 16-24 bits for the local connectionID?
It would be nice to externalize the host portion as a pseudo server_id, since some functions such as UUID_SHORT depend on it being small. The pseudo server_id's can be recycled as TiDB servers are deployed/disappear, with pd ensuring they are unique. This address size of the pseudo server_id would limit the max number of tidb servers deployed simultaneously.
Describe alternatives you've considered:
Using 8 bits for the host seems short, but actually using only 16 bits for the local connection Id is also dangerously low, if you want to make sure new connection Ids allocated do not chase the tail of old connectionIds. I will let someone smarter figure that out :-)
Teachability, Documentation, Adoption, Migration Strategy:
It makes sense to extend SHOW PROCESSLIST to be SHOW [GLOBAL|SESSION] PROCESSLIST. I think the default of global makes more sense
Contact the mentors: #tidb-challenge-program channel in TiDB Community Slack Workspace
I noticed that since MySQL 5.7 connection ID becomes 64 bit: https://bugs.mysql.com/bug.php?id=65715 So it might not be a hard problem now.
Tips to implement,
server_id be, and the format of the global connection ID.server_id from pd-server when tidb-server starts, and show the server_id in select * from information_schema.CLUSTER_INFO. (information_schema.CLUSTER_INFO already collects all information of tidb-servers, what we need is an extra column server_id)select * from information_schema.CLUSTER_PROCESSLISTCTRL-C or kill <connId>, whichever a tidb-server receives this command, it forwards to the right tidb-server by global connection ID./pick-up-challenge
@pingyu pick up issue success
I noticed that since MySQL 5.7 connection ID becomes 64 bit: https://bugs.mysql.com/bug.php?id=65715 So it might not be a hard problem now.
I don't believe this is fully fixed in MySQL 5.7. The server internally uses 64-bit, but the client APIs only expose the lower 32-bits. The fix in 5.7 is that it ensures the lower 32-bits never overlap, which is feasible only in single server context.
Another way to implement this would be to use a design similar to auto_increment works (tidb servers request connection IDs in batches). A batch of 2^16 ids should last for quite a while. I am not sure what is the best way to prevent connection IDs from being reused though, it needs to be possible to have long living connections so that operations like backup are possible.
@wwar It is already 64bit in 5.6.9:
Connection IDs now are permitted to be 64-bit values when the server supports them (when built with 64-bit data types), which has these effects:
Connection IDs are logged correctly to the general query log and slow query log.Note
This change involves a modification to the log tables, so after upgrading to this release, you must run mysql_upgrade and restart the server.CONNECTION_ID() returns a data type appropriate for values larger than 32 bits.
mysql_thread_id() is unchanged; the client/server protocal has only 4 bytes for the ID value. This function returns an incorrect (truncated) value for connection IDs larger than 32 bits and should be avoided.
mysql_kill() still cannot handle values larger than 32 bits, but to guard against killing the wrong thread now returns an error in these cases:
If given an ID larger than 32 bits, mysql_kill() returns a CR_INVALID_CONN_HANDLE error.
After the server's internal thread ID counter reaches a value larger than 32 bits, it returns an ER_DATA_OUT_OF_RANGE error for any mysql_kill() invocation and mysql_kill() fails.
To avoid problems with mysql_thread_id() and mysql_kill(), do not use them. To get the connection ID, execute a SELECT CONNECTION_ID() query and retrieve the result. To kill a thread, execute a KILL statement.
Seem the connection ID in the handshake protocol is still 32 bits, see https://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake
we need to take care of it, e.g. the higher 32 bits is TiDB server ID, the lower is connection ID, we can return the lower part in the handshake protocol.
Thanks to all the friends and discussions here and slack.
After gather the information by far, I write a draft design doc at #17649.
PTAL, thanks~
This pick has been automatically canceled after more than a week.
Hi @pingyu , I tracked this issue from the PD PR. IMO, it is a bad idea to use PD's allocID to generate tidb-server ID. The way allocID works is to allocate increasing IDs, and it will skip 1000 when restarting or leader switching. And what may be more needed here is an id similar to the Linux process number, just to ensure that it is not repeated for a period of time. In particular, PD's allocID does not guarantee that it will not exceed the int32 range.
May I ask the slack channel you were discussing? I might need some more context.
Hi @pingyu , I tracked this issue from the PD PR. IMO, it is a bad idea to use PD's
allocIDto generate tidb-server ID. The way allocID works is to allocate increasing IDs, and it will skip 1000 when restarting or leader switching. And what may be more needed here is an id similar to the Linux process number, just to ensure that it is not repeated for a period of time. In particular, PD's allocID does not guarantee that it will not exceed the int32 range.
May I ask the slack channel you were discussing? I might need some more context.
Sure, that's #sig-exec.
I have @ you in the thread.
64-bit connection id is done by @pingyu in https://github.com/pingcap/tidb/pull/17649
Reopen it.
It has a second step to support a 32-bit connection id and @pingyu will continue to work on it.
Most helpful comment
Seem the connection ID in the handshake protocol is still 32 bits, see https://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake
we need to take care of it, e.g. the higher 32 bits is TiDB server ID, the lower is connection ID, we can return the lower part in the handshake protocol.