Proxysql: Removing server from cluster; continues to attempt connect

Created on 4 Jan 2018 · 13Comments · Source: sysown/proxysql

ProxySQL continues to attempt contact to cluster members that have been removed. Below, both .101 and .102 were removed and replaced with new hosts. Saved to disk. Loaded to runtime. Checking logs, proxysql still attempting to contact the 2 deleted hosts.

mysql-lb2> SELECT * FROM proxysql_servers;
+--------------+------+--------+------------+
| hostname     | port | weight | comment    |
+--------------+------+--------+------------+
| 10.64.64.101 | 6032 | 0      | perconalb1 |
| 10.64.64.102 | 6032 | 0      | perconalb2 |
+--------------+------+--------+------------+
2 rows in set (0.00 sec)

mysql-lb2> INSERT INTO proxysql_servers VALUES ('10.64.64.161', 6032, 0, 'mysql-lb1');
Query OK, 1 row affected (0.00 sec)

mysql-lb2> INSERT INTO proxysql_servers VALUES ('10.64.64.162', 6032, 0, 'mysql-lb2');
Query OK, 1 row affected (0.00 sec)

mysql-lb2> DELETE FROM proxysql_servers WHERE comment LIKE 'percona%';
Query OK, 2 rows affected (0.00 sec)

mysql-lb2> SAVE PROXYSQL SERVERS TO DISK; LOAD PROXYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.00 sec)

mysql-lb2> SELECT * FROM proxysql_servers;
+--------------+------+--------+-----------+
| hostname     | port | weight | comment   |
+--------------+------+--------+-----------+
| 10.64.64.161 | 6032 | 0      | mysql-lb1 |
| 10.64.64.162 | 6032 | 0      | mysql-lb2 |
+--------------+------+--------+-----------+
2 rows in set (0.00 sec)

2018-01-03 16:17:34 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)

CLUSTER bug

Source

utdrmac

Most helpful comment

I think I found the issue. If a node fails before it is removed from proxysql_servers , the code checking if a node was removed is not executed.
It will be fixed in 1.4.5.

Thank you for the report

renecannao on 4 Jan 2018

👍2

All 13 comments

Had to restart proxysql to get it to stop trying to reach the deleted hosts.

utdrmac on 4 Jan 2018

Matthew, by any chance, can you provide what is being asked in the issue template?

If you are submitting a bug report, please provide a clear description of your issue, the version of OS and ProxySQL, every step to reproduce the issue, and the error log. If it is a crashing bug, a core dump will be extremely useful.

Thanks

renecannao on 4 Jan 2018

root@mysql-lb2:~# proxysql --version
ProxySQL version v1.4.3-1.1, codename Truls
root@mysql-lb2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.3 (stretch)
Release:    9.3
Codename:   stretch

2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0x1895C7EAE7998290 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_query_rules from peer 10.64.64.162:6032 matches with local checksum 0x1895C7EAE7998290 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_servers from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0xEA0156A75FFE4802 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_servers from peer 10.64.64.162:6032 matches with local checksum 0xEA0156A75FFE4802 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for mysql_users from peer 10.64.64.162:6032, version 2, epoch 1515026071, checksum 0x7C9622B415813384 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for mysql_users from peer 10.64.64.162:6032 matches with local checksum 0x7C9622B415813384 , we won't sync.
2018-01-03 16:34:32 [INFO] Cluster: detected a new checksum for proxysql_servers from peer 10.64.64.162:6032, version 4, epoch 1515026071, checksum 0xC7771456FC3C0700 . Not syncing yet ...
2018-01-03 16:34:32 [INFO] Cluster: checksum for proxysql_servers from peer 10.64.64.162:6032 matches with local checksum 0xC7771456FC3C0700 , we won't sync.
2018-01-03 16:34:32 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)
2018-01-03 16:34:32 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Can't connect to MySQL server on '10.64.64.102' (107)
root@mysql-lb2:~# tail /var/lib/proxysql/proxysql.log
2018-01-03 16:34:35 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:35 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:36 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Can't connect to MySQL server on '10.64.64.101' (107)
2018-01-03 16:34:36 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Can't connect to MySQL server on '10.64.64.102' (107)
2018-01-03 16:34:38 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.101:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110
2018-01-03 16:34:38 ProxySQL_Cluster.cpp:172:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer 10.64.64.102:6032 . Error: Lost connection to MySQL server at 'handshake: waiting for inital communication packet', system error: 110

utdrmac on 4 Jan 2018

Is my description not OK? I had two servers A, B. They sync'd fine. I removed them and added new hosts. New hosts sync fine, but old hosts continue to be looked for.

utdrmac on 4 Jan 2018

The error log should be very verbose on Cluster module, therefore having the full error log would really help, instead of 1 or 10 lines.

renecannao on 4 Jan 2018

Unfortunately, due to app issues (duplicate key warnings), the error log has 50,000+ lines in it. I ran this grep -ir cluster /var/lib/proxysql/proxysql.log >cluster.log Will that work?
cluster.log.gz

utdrmac on 4 Jan 2018

I think it was more efficient to grep -v all duplicate key warnings.
For example, I wanted to find when LOAD PROXYSQL SERVERS TO RUNTIME was executed

renecannao on 4 Jan 2018

root@mysql-lb2:~# grep -v "Data too\|Out of\|Duplicate" /var/lib/proxysql/proxysql.log >cluster2.log
cluster2.log.gz

utdrmac on 4 Jan 2018

Making some guessing due missing information (error log please):

at 16:03:12 : proxysql .101 was working fine
at 16:05:03 : proxysql .101 was stopped (error 110)
at 16:08:07 : ProxySQL was restarted
at 16:09:21 : ProxySQL was restarted
at 16:10:12 : ProxySQL was restarted
at 16:11:28 : ProxySQL was stopped
at 16:11:38 : ProxySQL was started
at 16:13:42 : the new proxysql_servers table was loaded at runtime. Some problem is here, because the new cluster nodes were added and threads created (that is correct), while for the old nodes the entries were destroyed (that is good) but the threads didn't exit (that is bad):

2018-01-03 16:13:42 [INFO] Created new Cluster Node Entry for host 10.64.64.161:6032
2018-01-03 16:13:42 [INFO] Created new Cluster Node Entry for host 10.64.64.162:6032
2018-01-03 16:13:42 [INFO] Destroyed Cluster Node Entry for host 10.64.64.102:6032
2018-01-03 16:13:42 [INFO] Destroyed Cluster Node Entry for host 10.64.64.101:6032
2018-01-03 16:13:42 [INFO] Cluster: starting thread for peer 10.64.64.162:6032
2018-01-03 16:13:42 [INFO] Cluster: starting thread for peer 10.64.64.161:6032

at 16:34:30 : proxysql was restarted
errors about 10.64.64.101 and 10.64.64.102 continue till the end of the the error log (16:47:51), that conflicts with https://github.com/sysown/proxysql/issues/1323#issuecomment-355165885

renecannao on 4 Jan 2018

That's the whole error log.
.101 and .102 were removed from use; VMs destroyed.
I copied the /var/lib/proxysql/* to new hosts .161 and .162
Started proxysql on .161 and .162.
Removed old .101/.102 from proxysql_servers.
Added .161/.162 to proxysql_servers
save/load proxysql servers.
ProxySQL continues to attempt contact to removed .101/.102

The conflict regarding the error message, the comment above about restarting proxysql to fix this was done on .161.
.161 and .162 were both attempting to reach out to .101 and .102 even after deleting them. I tested restarting proxysql on .161 to see if that fixed the issue. It did. I have not systemctl restart proxysql on .162 yet so I can report/help bug here.

utdrmac on 4 Jan 2018

Checking the new error log. Thanks

renecannao on 4 Jan 2018

I think I found the issue. If a node fails before it is removed from proxysql_servers , the code checking if a node was removed is not executed.
It will be fixed in 1.4.5.

Thank you for the report

renecannao on 4 Jan 2018

👍2

Bug has been tested on V1.4.5. it is resolved and Threads are getting properly closed even if a node fails before it is removed fromproxysql_servers

2018-01-10 15:31:25 [INFO] Cluster: Fetching ProxySQL Servers from peer 172.17.0.10:6032 started 2018-01-10 15:31:25 [INFO] Cluster: Fetching ProxySQL Servers from peer 172.17.0.10:6032 completed 2018-01-10 15:31:25 [INFO] Cluster: Loading to runtime ProxySQL Servers from peer 172.17.0.10:6032 2018-01-10 15:31:25 [INFO] Destroyed Cluster Node Entry for host 172.17.0.9:6032 2018-01-10 15:31:25 [INFO] Cluster: Saving to disk ProxySQL Servers from peer 172.17.0.10:6032 2018-01-10 15:31:26 [INFO] Cluster: closing thread for peer 172.17.0.9:6032