ProxySQL continues to attempt contact to cluster members that have been removed. Below, both .101 and .102 were removed and replaced with new hosts. Saved to disk. Loaded to runtime. Checking logs, proxysql still attempting to contact the 2 deleted hosts.
mysql-lb2> SELECT * FROM proxysql_servers;
+--------------+------+--------+------------+
| hostname | port | weight | comment |
+--------------+------+--------+------------+
| 10.64.64.101 | 6032 | 0 | perconalb1 |
| 10.64.64.102 | 6032 | 0 | perconalb2 |
+--------------+------+--------+------------+
2 rows in set (0.00 sec)
mysql-lb2> INSERT INTO proxysql_servers VALUES ('10.64.64.161', 6032, 0, 'mysql-lb1');
Query OK, 1 row affected (0.00 sec)
mysql-lb2> INSERT INTO proxysql_servers VALUES ('10.64.64.162', 6032, 0, 'mysql-lb2');
Query OK, 1 row affected (0.00 sec)
mysql-lb2> DELETE FROM proxysql_servers WHERE comment LIKE 'percona%';
Query OK, 2 rows affected (0.00 sec)
mysql-lb2> SAVE PROXYSQL SERVERS TO DISK; LOAD PROXYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.06 sec)
Query OK, 0 rows affected (0.00 sec)
mysql-lb2> SELECT * FROM proxysql_servers;
+--------------+------+--------+-----------+
| hostname | port | weight | comment |
+--------------+------+--------+-----------+
| 10.64.64.161 | 6032 | 0 | mysql-lb1 |
| 10.64.64.162 | 6032 | 0 | mysql-lb2 |
+--------------+------+--------+-----------+
2 rows in set (0.00 sec)
2018-01-03 16:17:34 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)
Had to restart proxysql to get it to stop trying to reach the deleted hosts.
Matthew, by any chance, can you provide what is being asked in the issue template?
If you are submitting a bug report, please provide a clear description of your issue, the version of OS and ProxySQL, every step to reproduce the issue, and the error log. If it is a crashing bug, a core dump will be extremely useful.
Thanks
root@mysql-lb2:~# proxysql --version
ProxySQL version v1.4.3-1.1, codename Truls
root@mysql-lb2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 9.3 (stretch)
Release: 9.3
Codename: stretch
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0x1895C7EAE7998290 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_query_rules from peer 10.64.64.162:6032 matches with local checksum 0x1895C7EAE7998290 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_servers from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0xEA0156A75FFE4802 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_servers from peer 10.64.64.162:6032 matches with local checksum 0xEA0156A75FFE4802 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0x7C9622B415813384 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_users from peer 10.64.64.162:6032 matches with local checksum 0x7C9622B415813384 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.64.64.162:6032, version 4, epoch 1515026071, checksum 0xC7771456FC3C0700 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for proxysql_servers from peer 10.64.64.162:6032 matches with local checksum 0xC7771456FC3C0700 , we won't sync.
2018-01-03 16:34:32 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)
2018-01-03 16:34:32 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Can't connect to MySQL server on '10.64.64.102' (107)
root@mysql-lb2:~# tail /var/lib/proxysql/proxysql.log
2018-01-03 16:34:35 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:35 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:36 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)
2018-01-03 16:34:36 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Can't connect to MySQL server on '10.64.64.102' (107)
2018-01-03 16:34:38 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:38 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
Is my description not OK? I had two servers A, B. They sync'd fine. I removed them and added new hosts. New hosts sync fine, but old hosts continue to be looked for.
The error log should be very verbose on Cluster module, therefore having the full error log would really help, instead of 1 or 10 lines.
Unfortunately, due to app issues (duplicate key warnings), the error log has 50,000+ lines in it. I ran this grep -ir cluster /var/lib/proxysql/proxysql.log >cluster.log Will that work?
cluster.log.gz
I think it was more efficient to grep -v all duplicate key warnings.
For example, I wanted to find when LOAD PROXYSQL SERVERS TO RUNTIME was executed
root@mysql-lb2:~# grep -v "Data too\|Out of\|Duplicate" /var/lib/proxysql/proxysql.log >cluster2.log
cluster2.log.gz
Making some guessing due missing information (error log please):
proxysql_servers table was loaded at runtime. Some problem is here, because the new cluster nodes were added and threads created (that is correct), while for the old nodes the entries were destroyed (that is good) but the threads didn't exit (that is bad):2018-01-03 16:13:42 [INFO] Created new Cluster Node Entry for host 10.64.64.161:6032
2018-01-03 16:13:42 [INFO] Created new Cluster Node Entry for host 10.64.64.162:6032
2018-01-03 16:13:42 [INFO] Destroyed Cluster Node Entry for host 10.64.64.102:6032
2018-01-03 16:13:42 [INFO] Destroyed Cluster Node Entry for host 10.64.64.101:6032
2018-01-03 16:13:42 [INFO] Cluster: starting thread for peer 10.64.64.162:6032
2018-01-03 16:13:42 [INFO] Cluster: starting thread for peer 10.64.64.161:6032
10.64.64.101 and 10.64.64.102 continue till the end of the the error log (16:47:51), that conflicts with https://github.com/sysown/proxysql/issues/1323#issuecomment-355165885That's the whole error log.
.101 and .102 were removed from use; VMs destroyed.
I copied the /var/lib/proxysql/* to new hosts .161 and .162
Started proxysql on .161 and .162.
Removed old .101/.102 from proxysql_servers.
Added .161/.162 to proxysql_servers
save/load proxysql servers.
ProxySQL continues to attempt contact to removed .101/.102
The conflict regarding the error message, the comment above about restarting proxysql to fix this was done on .161.
.161 and .162 were both attempting to reach out to .101 and .102 even after deleting them. I tested restarting proxysql on .161 to see if that fixed the issue. It did. I have not systemctl restart proxysql on .162 yet so I can report/help bug here.
Checking the new error log. Thanks
I think I found the issue. If a node fails before it is removed from proxysql_servers , the code checking if a node was removed is not executed.
It will be fixed in 1.4.5.
Thank you for the report
Bug has been tested on V1.4.5. it is resolved and Threads are getting properly closed even if a node fails before it is removed fromproxysql_servers
2018-01-10 15:31:25 [INFO] Cluster: Fetching ProxySQL Servers from peer 172.17.0.10:6032 started
2018-01-10 15:31:25 [INFO] Cluster: Fetching ProxySQL Servers from peer 172.17.0.10:6032 completed
2018-01-10 15:31:25 [INFO] Cluster: Loading to runtime ProxySQL Servers from peer 172.17.0.10:6032
2018-01-10 15:31:25 [INFO] Destroyed Cluster Node Entry for host 172.17.0.9:6032
2018-01-10 15:31:25 [INFO] Cluster: Saving to disk ProxySQL Servers from peer 172.17.0.10:6032
2018-01-10 15:31:26 [INFO] Cluster: closing thread for peer 172.17.0.9:6032
Most helpful comment
I think I found the issue. If a node fails before it is removed from
proxysql_servers, the code checking if a node was removed is not executed.It will be fixed in 1.4.5.
Thank you for the report