Describe the bug
The remote function unexpectedly throws a connection error when attempting to connect to a "connectable" server.
How to reproduce
<!-- Users and ACL. -->
<users>
<!-- If user name was not specified, 'default' user is used. -->
<default>
<!-- Password could be specified in plaintext or in SHA256 (in hex format).
If you want to specify password in plaintext (not recommended), place it in 'password' element.
Example: <password>qwerty</password>.
Password could be empty.
If you want to specify SHA256, place it in 'password_sha256_hex' element.
Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
How to generate decent password:
Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
In first line will be password and in second - corresponding SHA256.
-->
<password></password>
<!-- List of networks with open access.
To open access from everywhere, specify:
<ip>::/0</ip>
To open access only from localhost, specify:
<ip>::1</ip>
<ip>127.0.0.1</ip>
Each element of list has one of the following forms:
<ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
<host> Hostname. Example: server01.yandex.ru.
To check access, DNS query is performed, and all received addresses compared to peer address.
<host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.yandex\.ru$
To check access, DNS PTR query is performed for peer address and then regexp is applied.
Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
Strongly recommended that regexp is ends with $
All results of DNS requests are cached till server restart.
-->
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<!-- Settings profile for user. -->
<profile>default</profile>
<!-- Quota for user. -->
<quota>default</quota>
</default>
<!-- Example of user with readonly access. -->
<readonly>
<password></password>
<networks incl="networks" replace="replace">
<ip>::1</ip>
<ip>127.0.0.1</ip>
</networks>
<profile>readonly</profile>
<quota>default</quota>
</readonly>
</users>
Expected behavior
host01 :) select count(*) from remote('host02', system, "settings")
SELECT count(*)
FROM remote('host02', system, settings)
[host01] 2019.04.30 20:19:17.257224 {2bd95337-4089-43ef-82f4-3790614907cd} [ 39 ] <Debug> executeQuery: (from [::ffff:127.0.0.1]:53780) select count(*) from remote('host02', system, "settings")
โ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) [host02] 2019.04.30 20:19:17.341413 {ca1f7f65-4d3a-4b15-879c-d316b502389b} [ 39 ] <Debug> executeQuery: (from [::ffff:10.00.000.100]:46242, initial_query_id: 2bd95337-4089-43ef-82f4-3790614907cd) DESC TABLE system.settings
[host02] 2019.04.30 20:19:17.341676 {ca1f7f65-4d3a-4b15-879c-d316b502389b} [ 39 ] <Debug> executeQuery: Query pipeline:
One
[host02] 2019.04.30 20:19:17.341966 {ca1f7f65-4d3a-4b15-879c-d316b502389b} [ 39 ] <Information> executeQuery: Read 4 rows, 266.00 B in 0.000 sec., 8242 rows/sec., 535.27 KiB/sec.
[host02] 2019.04.30 20:19:17.341979 {ca1f7f65-4d3a-4b15-879c-d316b502389b} [ 39 ] <Debug> MemoryTracker: Peak memory usage (for query): 1.07 MiB.
[host02] 2019.04.30 20:19:17.426472 {631cae5e-6772-4e36-95c9-40d03d487a52} [ 25 ] <Debug> executeQuery: (from [::ffff:10.00.000.100]:46260, initial_query_id: 2bd95337-4089-43ef-82f4-3790614907cd) DESC TABLE system.settings
[host02] 2019.04.30 20:19:17.426782 {631cae5e-6772-4e36-95c9-40d03d487a52} [ 25 ] <Debug> executeQuery: Query pipeline:
One
[host02] 2019.04.30 20:19:17.427089 {631cae5e-6772-4e36-95c9-40d03d487a52} [ 25 ] <Information> executeQuery: Read 4 rows, 266.00 B in 0.001 sec., 6987 rows/sec., 453.80 KiB/sec.
[host02] 2019.04.30 20:19:17.427103 {631cae5e-6772-4e36-95c9-40d03d487a52} [ 25 ] <Debug> MemoryTracker: Peak memory usage (for query): 1.07 MiB.
[host01] 2019.04.30 20:19:17.424397 {2bd95337-4089-43ef-82f4-3790614907cd} [ 39 ] <Debug> executeQuery: Query pipeline:
Remote
โโcount()โโ
โ 198 โ
โโโโโโโโโโโ
โ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) [host02] 2019.04.30 20:19:17.468844 {c33e01a2-0188-4d47-b87e-07357a3502f6} [ 39 ] <Debug> executeQuery: (from [::ffff:10.00.000.100]:46242, initial_query_id: 2bd95337-4089-43ef-82f4-3790614907cd) SELECT count() FROM system.settings
[host02] 2019.04.30 20:19:17.469382 {c33e01a2-0188-4d47-b87e-07357a3502f6} [ 39 ] <Debug> executeQuery: Query pipeline:
Expression
Expression
Aggregating
Concat
Expression
One
โ Progress: 198.00 rows, 29.13 KB (946.36 rows/s., 139.23 KB/s.) [host02] 2019.04.30 20:19:17.469713 {c33e01a2-0188-4d47-b87e-07357a3502f6} [ 39 ] <Information> executeQuery: Read 198 rows, 28.45 KiB in 0.001 sec., 235637 rows/sec., 33.06 MiB/sec.
[host02] 2019.04.30 20:19:17.469726 {c33e01a2-0188-4d47-b87e-07357a3502f6} [ 39 ] <Debug> MemoryTracker: Peak memory usage (for query): 1.09 MiB.
[host01] 2019.04.30 20:19:17.464816 {2bd95337-4089-43ef-82f4-3790614907cd} [ 39 ] <Information> executeQuery: Read 198 rows, 28.45 KiB in 0.208 sec., 954 rows/sec., 137.07 KiB/sec.
[host01] 2019.04.30 20:19:17.464839 {2bd95337-4089-43ef-82f4-3790614907cd} [ 39 ] <Debug> MemoryTracker: Peak memory usage (for query): 7.04 MiB.
1 rows in set. Elapsed: 0.210 sec.
Error message and/or stacktrace
host01 :) select count(*) from remote('host02', system, "settings");
SELECT count(*)
FROM remote('host02', system, `settings`)
[host01] 2019.04.26 18:26:43.079540 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Debug> executeQuery: (from [::ffff:127.0.0.1]:60802) select count(*) from remote('host02', system, "settings");
[host01] 2019.04.26 18:26:43.130489 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Warning> ConnectionPoolWithFailover: Connection failed at try โ1, reason: Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
[host01] 2019.04.26 18:26:43.180652 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Warning> ConnectionPoolWithFailover: Connection failed at try โ2, reason: Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
[host01] 2019.04.26 18:26:43.230794 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Warning> ConnectionPoolWithFailover: Connection failed at try โ3, reason: Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
[host01] 2019.04.26 18:26:43.230940 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Debug> MemoryTracker: Peak memory usage (total): 97.50 KiB.
[host01] 2019.04.26 18:26:43.302733 {69e973ab-e784-4fd8-a3b1-5e6655c3b0d5} [ 39 ] <Error> executeQuery: Code: 279, e.displayText() = DB::NetException: All connection tries failed. Log:
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
(from [::ffff:127.0.0.1]:60802) (in query: select count(*) from remote('host02', system, "settings");), Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x6f1eb16]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x33992d2]
2. /usr/bin/clickhouse-server(PoolWithFailoverBase<DB::IConnectionPool>::getMany(unsigned long, unsigned long, std::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> const&, std::function<unsigned long (unsigned long)> const&, bool)+0x1e5d) [0x66dc18d]
3. /usr/bin/clickhouse-server(DB::ConnectionPoolWithFailover::getManyImpl(DB::Settings const*, DB::PoolMode, std::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult (DB::IConnectionPool&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> const&)+0xcd) [0x66d3bad]
4. /usr/bin/clickhouse-server(DB::ConnectionPoolWithFailover::getManyChecked(DB::Settings const*, DB::PoolMode, DB::QualifiedTableName const&)+0x8e) [0x66d412e]
5. /usr/bin/clickhouse-server() [0x61bbad7]
6. /usr/bin/clickhouse-server(DB::RemoteBlockInputStream::sendQuery()+0x40) [0x61bdf20]
7. /usr/bin/clickhouse-server(DB::getStructureOfRemoteTable(DB::Cluster const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context const&, std::shared_ptr<DB::IAST> const&)+0x9d3) [0x6595913]
8. /usr/bin/clickhouse-server(DB::TableFunctionRemote::executeImpl(std::shared_ptr<DB::IAST> const&, DB::Context const&) const+0xa12) [0x3517a32]
9. /usr/bin/clickhouse-server(DB::ITableFunction::execute(std::shared_ptr<DB::IAST> const&, DB::Context const&) const+0x4e) [0x676ea9e]
10. /usr/bin/clickhouse-server(DB::Context::executeTableFunction(std::shared_ptr<DB::IAST> const&)+0x2be) [0x629f5be]
11. /usr/bin/clickhouse-server(DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, DB::Context const&, std::shared_ptr<DB::IBlockInputStream> const&, std::shared_ptr<DB::IStorage> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::QueryProcessingStage::Enum, unsigned long, bool)+0xb7b) [0x62fb2db]
12. /usr/bin/clickhouse-server(DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, DB::Context const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::QueryProcessingStage::Enum, unsigned long, bool)+0x56) [0x62fbc16]
13. /usr/bin/clickhouse-server(DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr<DB::IAST> const&, DB::Context const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::QueryProcessingStage::Enum, unsigned long, bool)+0x7e7) [0x6307e07]
14. /usr/bin/clickhouse-server(DB::InterpreterFactory::get(std::shared_ptr<DB::IAST>&, DB::Context&, DB::QueryProcessingStage::Enum)+0x3b0) [0x62e33d0]
15. /usr/bin/clickhouse-server() [0x6441c24]
16. /usr/bin/clickhouse-server(DB::executeQuery(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool)+0x81) [0x6443aa1]
17. /usr/bin/clickhouse-server(DB::TCPHandler::runImpl()+0x4a6) [0x33a92b6]
18. /usr/bin/clickhouse-server(DB::TCPHandler::run()+0x2b) [0x33aa48b]
19. /usr/bin/clickhouse-server(Poco::Net::TCPServerConnection::start()+0xf) [0x705866f]
20. /usr/bin/clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x16a) [0x7058a4a]
21. /usr/bin/clickhouse-server(Poco::PooledThread::run()+0x77) [0x7134f57]
22. /usr/bin/clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x38) [0x7130e18]
23. /usr/bin/clickhouse-server() [0xacce88f]
24. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7fdf27b7c064]
25. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fdf271a462d]
Received exception from server (version 19.3.6):
Code: 279. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::NetException. DB::NetException: All connection tries failed. Log:
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 10.10.1.111:9000 (host02:9000, 10.10.1.111)
.
0 rows in set. Elapsed: 0.226 sec.
Additional context
The following two commands work.
host01:~$ clickhouse-client -h $host02IP -q 'select count(*) from system."settings"'
198
host01:~$ clickhouse-client -h host02 -q 'select count(*) from system."settings"'
198
There are also no logs generated on host02 in the failure case.
Results of netstat:
host01:~$ sudo netstat -nlp and netstat -nap | grep clickhouse
tcp 0 0 127.0.0.1:39432 127.0.0.1:9000 ESTABLISHED 83776/clickhouse-cl
tcp6 0 0 :::9000 :::* LISTEN 53591/clickhouse-se
tcp6 0 0 :::9009 :::* LISTEN 53591/clickhouse-se
tcp6 0 0 :::8123 :::* LISTEN 53591/clickhouse-se
tcp6 0 0 127.0.0.1:9000 127.0.0.1:39432 ESTABLISHED 53591/clickhouse-se
unix 3 [ ] STREAM CONNECTED 116688898 53591/clickhouse-se
host02:~$ sudo netstat -nlp and netstat -nap | grep clickhouse
tcp6 0 0 :::9000 :::* LISTEN 70251/clickhouse-se
tcp6 0 0 :::9009 :::* LISTEN 70251/clickhouse-se
tcp6 0 0 :::8123 :::* LISTEN 70251/clickhouse-se
unix 3 [ ] STREAM CONNECTED 133148652 70251/clickhouse-se
Host02 hostname/IP and port combo are reachable from host01.
Results of tcpdump on host02 in the error case:
host02:~$ sudo tcpdump -v -n dst port 9000
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:30:09.179092 IP (tos 0x0, ttl 49, id 61232, offset 0, flags [DF], proto TCP (6), length 60)
$HOST01IP.58670 > $HOST02IP.9000: Flags [S], cksum 0x87c9 (correct), seq 2184761190, win 29200, options [mss 1460,sackOK,TS val 1228808278 ecr 0,nop,wscale 7], length 0
20:30:09.227461 IP (tos 0x0, ttl 49, id 21551, offset 0, flags [DF], proto TCP (6), length 60)
$HOST01IP.58678 > $HOST02IP.9000: Flags [S], cksum 0x08a4 (correct), seq 1550021709, win 29200, options [mss 1460,sackOK,TS val 1228808290 ecr 0,nop,wscale 7], length 0
20:30:09.239830 IP (tos 0x0, ttl 49, id 14756, offset 0, flags [DF], proto TCP (6), length 40)
$HOST01IP.58670 > $HOST02IP.9000: Flags [R], cksum 0xcb4d (correct), seq 2184761191, win 0, length 0
20:30:09.279520 IP (tos 0x0, ttl 49, id 45150, offset 0, flags [DF], proto TCP (6), length 60)
$HOST01IP.58686 > $HOST02IP.9000: Flags [S], cksum 0x427e (correct), seq 110137393, win 29200, options [mss 1460,sackOK,TS val 1228808303 ecr 0,nop,wscale 7], length 0
20:30:09.284833 IP (tos 0x0, ttl 49, id 14758, offset 0, flags [DF], proto TCP (6), length 40)
$HOST01IP.58678 > $HOST02IP.9000: Flags [R], cksum 0x4c34 (correct), seq 1550021710, win 0, length 0
20:30:09.340346 IP (tos 0x0, ttl 49, id 14768, offset 0, flags [DF], proto TCP (6), length 40)
$HOST01IP.58686 > $HOST02IP.9000: Flags [R], cksum 0x861b (correct), seq 110137394, win 0, length 0
Seems clickhouse-client uses IPV6
executeQuery: (from [::ffff:127.0.0.1]:60802) select count(*) from remote('host02,host03'
executeQuery: (from [::ffff:10.00.000.100]:46260
And clickhouse-server uses IPV4
Timeout: connect timed out: 10.10.1.111:9000
And as I see from tcpdump you have an network issue with IPV4 routing / fw
Inspecting the packet capture, it looks like the destination IP from the client -> server is an IPV4 address.
Also looks like the client -> server send/receive packets from each other attempting to set up a TCP connection but fail to do so.
OTOH, I have verified that I can establish a TCP connection in both directions by running a simple web server on port 9000.
Inspecting the packet capture, it looks like the destination IP from the client -> server is an IPV4 address.
Yes. And the problem your IPV4 is not working.
Also looks like the client -> server send/receive packets from each other attempting to set up a TCP connection but fail to do so.
Yes. Because your IPV4 is not working.
OTOH, I have verified that I can establish a TCP connection in both directions by running a simple web server on port 9000.
By curl ? Using IPV6 ?
Try to force curl to use IPV4.
curl -4 -vvv http://host2:8123/
IPV4 is working fine.
host01:~$ curl -4 -vvv host02:9001
* Rebuilt URL to: host02:9001/
* Hostname was NOT found in DNS cache
* Trying $HOST02IP...
* Connected to host02 ($HOST02IP) port 9001 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: host02:9001
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/2.7.10
< Date: Wed, 01 May 2019 03:53:24 GMT
< Content-type: text/html
< Content-Length: 9
< Last-Modified: Wed, 01 May 2019 03:52:08 GMT
<
hello world
* Closing connection 0
@notbdu what do you have in your config.xml around here:
https://github.com/yandex/ClickHouse/blob/master/dbms/programs/server/config.xml#L70-L72
You could try to keep both lines or change one to another.
Also could you show the output of iptables -L?
I've tried both of the following configurations.
<!-- <listen_host>::</listen_host> -->
<!-- Same for hosts with disabled ipv6: -->
<listen_host>0.0.0.0</listen_host>
and
<listen_host>::</listen_host>
<!-- Same for hosts with disabled ipv6: -->
<!-- <listen_host>0.0.0.0</listen_host> -->
and
<listen_host>::</listen_host>
<!-- Same for hosts with disabled ipv6: -->
<listen_host>0.0.0.0</listen_host>
Output of iptables -L:
host02:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
host01:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
@notbdu I've read the description again and as far as I can see it doesn't work only if you mention host3 (in select count(*) from remote('host02,host03', system, "settings");). Have you been doing this reconfiguration on all three hosts? (with clickhouse-server restarts)
Also all other diagnostics you provided mention only host2 for some reason, but not host3. Does it actually exist?
And by the way what happens if you use IPv6+port for remote arguments instead of hostnames?
OTOH, I have verified that I can establish a TCP connection in both directions by running a simple web server on port 9000.
Can you do that with nc -vz 10.10.1.111 9000 from host1 (while there's a ClickHouse on other side)?
It doesn't work in the strictly host01 -> host02 case.
Hmmm, it looks like nc is able to establish a TCP connection when CH is running and listening on port 9000?
host01:~$ nc -vz 10.10.1.111 9000
nc: host02 (10.10.1.111) 9000 [9000] open
host01:~$ nc -4 -vz 10.10.1.111 9000
nc: host02 (10.10.1.111) 9000 [9000] open
What about host1 -> host3 and host3 -> host2?
Oddly, queries work between host01<-> host03 but do not work from host02 <-> host01 or host02 <-> host03. I'll document some of the debugging results I've seen below.
Running strace on host02 and host03 both show similar results as they correctly bind to the configured port.
host03:~$ sudo -u clickhouse strace -eaccept,connect,bind,getpeername,getsockname,getsockopt,recv,recvfrom,recvmsg -f -v -s 1024 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml 2>&1 | grep -Evi 'Debug|Process'
connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\204n\30\0\0\0\0\0", 8}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {6}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15
Include not found: clickhouse_remote_servers
Include not found: macros
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2019.05.01 18:31:00.637751 [ 1 ] {} <Information> : Starting ClickHouse 19.5.3.8 with revision 54417
[pid 54529] bind(7, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
[pid 54529] getsockname(7, {sa_family=AF_NETLINK, pid=54529, groups=00000000}, [12]) = 0
[pid 54529] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"L\0\0\0\24\0\2\0\344\345\311\\\1\325\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377)\n\0\0)\n\0\0X\0\0\0\24\0\2\0\344\345\311\\\1\325\0\0\2\31\200\0\2\0\0\0\10\0\1\0\nK\17\24\10\0\2\0\nK\17\24\10\0\4\0\nK\17\177\t\0\3\0eth0\0\0\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\357\n\0\0\357\n\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 164
[pid 54529] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"H\0\0\0\24\0\2\0\344\345\311\\\1\325\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377)\n\0\0)\n\0\0\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\344\345\311\\\1\325\0\0\n@\200\375\2\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\2\n\367\377\376\321a\0\24\0\6\0\377\377\377\377\377\377\377\377\362\n\0\0\362\n\0\0\10\0\10\0\200\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 144
[pid 54529] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\344\345\311\\\1\325\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
[pid 54529] connect(8, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 54529] recvmsg(8, {msg_name(0)=NULL, msg_iov(2)=[{"hosts\0", 6}, {"\310O\3\0\0\0\0\0", 8}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {9}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 14
2019.05.01 18:31:00.642283 [ 1 ] {} <Information> Application: starting up
2019.05.01 18:31:00.645452 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone `Zulu'.
Include not found: networks
Include not found: networks
2019.05.01 18:31:00.649939 [ 1 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/
2019.05.01 18:31:00.654571 [ 1 ] {} <Information> DatabaseOrdinary (default): Total 1 tables.
2019.05.01 18:31:00.661307 [ 1 ] {} <Information> DatabaseOrdinary (default): Starting up tables.
[pid 54529] bind(8, {sa_family=AF_INET6, sin6_port=htons(8123), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 54529] getsockname(8, {sa_family=AF_INET6, sin6_port=htons(8123), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:31:00.662668 [ 1 ] {} <Information> Application: Listening http://[::]:8123
[pid 54529] bind(9, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 54529] getsockname(9, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:31:00.662888 [ 1 ] {} <Information> Application: Listening tcp: [::]:9000
[pid 54529] bind(10, {sa_family=AF_INET6, sin6_port=htons(9009), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 54529] getsockname(10, {sa_family=AF_INET6, sin6_port=htons(9009), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:31:00.663094 [ 1 ] {} <Information> Application: Listening interserver http: [::]:9009
2019.05.01 18:31:00.663655 [ 1 ] {} <Information> Application: Available RAM: 251.79 GiB; physical cores: 16; logical cores: 32.
2019.05.01 18:31:00.663730 [ 1 ] {} <Information> Application: Ready for connections.
Include not found: clickhouse_remote_servers
Include not found: macros
Include not found: clickhouse_compression
host02:~$ sudo -u clickhouse strace -eaccept,connect,bind,getpeername,getsockname,getsockopt,recv,recvfrom,recvmsg -f -v -s 1024 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml 2>&1 | grep -Evi 'Debug|Process'
connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"passwd\0", 7}, {"\204n\30\0\0\0\0\0", 8}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {6}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15
Include not found: clickhouse_remote_servers
Include not found: macros
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2019.05.01 18:32:20.504599 [ 1 ] {} <Information> : Starting ClickHouse 19.5.3.8 with revision 54417
[pid 65854] bind(7, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
[pid 65854] getsockname(7, {sa_family=AF_NETLINK, pid=65854, groups=00000000}, [12]) = 0
[pid 65854] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"L\0\0\0\24\0\2\0004\346\311\\>\1\1\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\201\n\0\0\201\n\0\0X\0\0\0\24\0\2\0004\346\311\\>\1\1\0\2\31\200\0\2\0\0\0\10\0\1\0\n\27\212\216\10\0\2\0\n\27\212\216\10\0\4\0\n\27\212\377\t\0\3\0eth0\0\0\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\204\v\0\0\204\v\0\0P\0\0\0\24\0\2\0004\346\311\\>\1\1\0\2\20\200\0\4\0\0\0\10\0\1\0\254\21\0\1\10\0\2\0\254\21\0\1\f\0\3\0docker0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\341\330\365\2\341\330\365\2", 4096}], msg_controllen=0, msg_flags=0}, 0) = 244
[pid 65854] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"H\0\0\0\24\0\2\0004\346\311\\>\1\1\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377\202\n\0\0\202\n\0\0\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0004\346\311\\>\1\1\0\n@\200\375\2\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\232\3\233\377\376o\207n\24\0\6\0\377\377\377\377\377\377\377\377\214\v\0\0\214\v\0\0\10\0\10\0\200\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 144
[pid 65854] recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0004\346\311\\>\1\1\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
[pid 65854] connect(8, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 65854] recvmsg(8, {msg_name(0)=NULL, msg_iov(2)=[{"hosts\0", 6}, {"\310O\3\0\0\0\0\0", 8}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {9}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 14
2019.05.01 18:32:20.509109 [ 1 ] {} <Information> Application: starting up
2019.05.01 18:32:20.512701 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone `Zulu'.
Include not found: networks
Include not found: networks
2019.05.01 18:32:20.518209 [ 1 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/
2019.05.01 18:32:20.518586 [ 1 ] {} <Information> DatabaseOrdinary (system): Total 3 tables.
2019.05.01 18:32:20.534224 [ 1 ] {} <Information> DatabaseOrdinary (system): Starting up tables.
2019.05.01 18:32:20.536097 [ 1 ] {} <Information> DatabaseOrdinary (default): Total 424 tables.
[pid 65894] --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
2019.05.01 18:32:23.405681 [ 35 ] {} <Information> DatabaseOrdinary (default): 60.14%
2019.05.01 18:32:24.591361 [ 1 ] {} <Information> DatabaseOrdinary (default): Starting up tables.
2019.05.01 18:32:24.602210 [ 35 ] {} <Information> DatabaseOrdinary (default): 60.14%
2019.05.01 18:32:24.602228 [ 34 ] {} <Information> DatabaseOrdinary (default): 60.14%
[pid 65854] bind(9, {sa_family=AF_INET6, sin6_port=htons(8123), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 65854] getsockname(9, {sa_family=AF_INET6, sin6_port=htons(8123), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:32:24.611125 [ 1 ] {} <Information> Application: Listening http://[::]:8123
[pid 65854] bind(10, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 65854] getsockname(10, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:32:24.611419 [ 1 ] {} <Information> Application: Listening tcp: [::]:9000
[pid 65854] bind(11, {sa_family=AF_INET6, sin6_port=htons(9009), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 65854] getsockname(11, {sa_family=AF_INET6, sin6_port=htons(9009), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:32:24.611685 [ 1 ] {} <Information> Application: Listening interserver http: [::]:9009
2019.05.01 18:32:24.612215 [ 1 ] {} <Information> Application: Available RAM: 251.50 GiB; physical cores: 16; logical cores: 32.
2019.05.01 18:32:24.612308 [ 1 ] {} <Information> Application: Ready for connections.
Include not found: clickhouse_remote_servers
Include not found: macros
Include not found: clickhouse_compression
md5-c3743b5abf5993ad2822bfb514a8ac40
host01 :) select count(*) from remote('host03', system, "settings")
host03 logs ->
[pid 55167] accept(9, {sa_family=AF_INET6, sin6_port=htons(47274), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 12
[pid 55167] getsockname(12, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::ffff:$HOST03IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55163] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47274), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:34:38.792216 [ 25 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: [::ffff:$HOST01IP]:47274
[pid 55163] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47274), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55163] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47274), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55163] recvfrom(12, "\0\21ClickHouse server\23\5\221\251\3\0\7default\0", 1048576, 0, NULL, NULL) = 34
[pid 55163] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47274), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55163] recvfrom(12, "\5\1\6system\10settings", 1048576, 0, NULL, NULL) = 18
[pid 55163] getsockopt(12, SOL_SOCKET, SO_SNDTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
[pid 55163] getsockopt(12, SOL_SOCKET, SO_RCVTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
One
2019.05.01 18:34:38.838204 [ 25 ] {e2acf53c-379e-4be4-8bcb-bc6ec467663e} <Information> executeQuery: Read 4 rows, 266.00 B in 0.002 sec., 2183 rows/sec., 141.82 KiB/sec.
[pid 55167] accept(9, {sa_family=AF_INET6, sin6_port=htons(47284), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 14
[pid 55167] getsockname(14, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::ffff:$HOST03IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55164] getpeername(14, {sa_family=AF_INET6, sin6_port=htons(47284), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:34:38.874808 [ 27 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: [::ffff:$HOST01IP]:47284
[pid 55164] getpeername(14, {sa_family=AF_INET6, sin6_port=htons(47284), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55164] getpeername(14, {sa_family=AF_INET6, sin6_port=htons(47284), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55164] recvfrom(14, "\0\21ClickHouse server\23\5\221\251\3\0\7default\0", 1048576, 0, NULL, NULL) = 34
[pid 55164] getpeername(14, {sa_family=AF_INET6, sin6_port=htons(47284), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 55164] recvfrom(14, "\5\1\6system\10settings", 1048576, 0, NULL, NULL) = 18
[pid 55164] getsockopt(14, SOL_SOCKET, SO_SNDTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
[pid 55164] getsockopt(14, SOL_SOCKET, SO_RCVTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
One
2019.05.01 18:34:38.909891 [ 27 ] {002f0539-e29c-4e5d-98a2-4b64e6b5937c} <Information> executeQuery: Read 4 rows, 266.00 B in 0.001 sec., 4328 rows/sec., 281.13 KiB/sec.
[pid 55163] recvfrom(12, "\5\1\6system\10settings", 1048576, 0, NULL, NULL) = 18
[pid 55163] getsockopt(12, SOL_SOCKET, SO_SNDTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
[pid 55163] getsockopt(12, SOL_SOCKET, SO_RCVTIMEO, ",\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
2019.05.01 18:34:38.952188 [ 25 ] {464b6bc9-9f28-40a8-bd14-32692ef78d5e} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
Expression
Expression
Aggregating
Concat
Expression
One
2019.05.01 18:34:38.952651 [ 26 ] {} <Trace> Aggregator: Aggregating
2019.05.01 18:34:38.952841 [ 26 ] {} <Trace> Aggregator: Aggregation method: without_key
2019.05.01 18:34:38.952969 [ 26 ] {} <Trace> Aggregator: Aggregated. 198 to 1 rows (from 0.028 MiB) in 0.000 sec. (1171646.163 rows/sec., 164.389 MiB/sec.)
2019.05.01 18:34:38.953051 [ 26 ] {} <Trace> Aggregator: Merging aggregated data
2019.05.01 18:34:38.953561 [ 25 ] {464b6bc9-9f28-40a8-bd14-32692ef78d5e} <Information> executeQuery: Read 198 rows, 28.45 KiB in 0.002 sec., 98323 rows/sec., 13.80 MiB/sec.
2019.05.01 18:34:38.953920 [ 25 ] {464b6bc9-9f28-40a8-bd14-32692ef78d5e} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
[pid 55164] recvfrom(14, "", 1048576, 0, NULL, NULL) = 0
[pid 55163] recvfrom(12, "", 1048576, 0, NULL, NULL) = 0
[pid 55140] --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
md5-dc50225c3b8b33a7885bc0477db9417f
host01 :) select count(*) from remote('host02', system, "settings")
host02 logs ->
NONE
md5-ac9405c57aef58eee02575fcc50876b4
host01:~$ nc -vz $HOST02IP 9000
nc: host02 ($HOST02IP) 9000 [9000] open
host02 logs ->
[pid 66071] accept(10, {sa_family=AF_INET6, sin6_port=htons(47542), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 12
[pid 66071] getsockname(12, {sa_family=AF_INET6, sin6_port=htons(9000), inet_pton(AF_INET6, "::ffff:$HOST02IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 66067] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47542), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
2019.05.01 18:36:48.107270 [ 36 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: [::ffff:$HOST01IP]:47542
[pid 66067] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47542), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 66067] getpeername(12, {sa_family=AF_INET6, sin6_port=htons(47542), inet_pton(AF_INET6, "::ffff:$HOST01IP", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 66067] recvfrom(12, "", 1048576, 0, NULL, NULL) = 0
2019.05.01 18:36:48.107695 [ 36 ] {} <Warning> TCPHandler: Client has not sent any data.
Oddly, queries work between host01<-> host03 but do not work from host02 <-> host01 or host02 <-> host03. I'll document some of the debugging results I've seen below.
@notbdu then likely there's an easy solution: wipe host2 clean and replace it with fresh server, then let replication do the rest (or do backup+restore if you don't have replication configured for some reason).
Ah, these are test servers so I don't have any replication or clustering set up. I am just testing the remote function right now.
I did try again w/ a recently re-imaged host and still got the same error :(.
From the packet capture this is the exchange I am seeing:
host01 -> host02 (failure):
host01 -> host02 SYN packet
host01 -> host02 SYN packet
host02 -> host01 SYN, ACK packet
host01 receives SYN, ACK packet
host01 -> host02 TCP RESET
.... two more TCP RESET get sent
The timing of when host01 receives the SYN, ACK packet and when it sends out the TCP RESET packet is very close. host01 sends out the TCP RESET packet .000067 seconds after it gets the SYN, ACK packet.
It takes host01 approximately 0.06 seconds to receive the SYN, ACK in the failure case. In the success case, host01 receives a SYN, ACK in 0.019 seconds. This looks like a some sort of timeout issue? It looks like host01 gets impatient and starts sending TCP RESET packets when before it receives and processes the SYN, ACK packets?
https://clickhouse.yandex/docs/en/operations/settings/settings/#connect-timeout-with-failover-ms
connect_timeout_with_failover_ms Default value: 50.
cat /etc/clickhouse-server/conf.d/user_substitutes.xml
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<connect_timeout_with_failover_ms>1000</connect_timeout_with_failover_ms>
</default>
</profiles>
</yandex>
That was it, thanks @den-crane and @blinkov!
I added this to the /etc/clickhouse-server/users.xml and it is working now :).
Still need to fix the issue properly on our side to avoid confusion.
Most helpful comment
https://clickhouse.yandex/docs/en/operations/settings/settings/#connect-timeout-with-failover-ms
connect_timeout_with_failover_ms Default value: 50.
cat /etc/clickhouse-server/conf.d/user_substitutes.xml