Proxysql: Host gets completely removed from memory tables

Created on 21 Nov 2017  路  15Comments  路  Source: sysown/proxysql

We're running into an occasional issue where a host going into an OFFLINE_HARD state can be completely removed from both the runtime_mysql_servers table, and the mysql_servers table.

We can resolve the issue by loading the mysql servers from disk and then to runtime, but it leads to an issue where it requires a manual recovery.

The removed host is always the host managed by the replication_hostgroup. Let me know if the logs will be helpful, and I'll look to send some across.

Most helpful comment

Hi all,

I've seen a similar issue quite a few times in the last two days with ProxySQL 2.0.10. Node disappears from runtime_mysql_servers and doesn't appear until LOAD MYSQL SERVERS TO RUNTIME; is executed. I do not see an issue with mysql_servers, however.

I understand that this is an older issue and for previous version of ProxySQL, so I can create a new issue if that's more appropriate. Also, a minimal reproducible test case is, unfortunately, not losing a node when running, but rather starting up ProxySQL when no nodes are up. If you think this is a different issue, again, let me know. It's still worth looking into, probably.

ProxySQL is set up for PXC 3-node cluster, here's the configuration. mysql1 resolves to 192.168.70.10, and the discrepancy is just an artifact of configuration. I also made tests with writer_is_also_reader settings of 0 and 1, and the issue persists.

Thu Apr 23 14:00:40 2020 proxysql> select * from mysql_servers;
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 10           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 10           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Thu Apr 23 14:00:50 2020 proxysql> select * from mysql_galera_hostgroups;
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| 10               | 12                      | 11               | 9000              | 1      | 1           | 2                     | 16                      | TEST    |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
1 row in set (0.00 sec)

And here's the expected output of runtime_mysql_servers when everything's fine.

Thu Apr 23 14:08:20 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 11           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 12           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 12           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
5 rows in set (0.01 sec)

The below is an output from my connection to admin interface just looking at how things are going (running on proxysql machine, so timestamps match). 192.168.70.20 disappeared between 08:25:29 and 08:25:37.

Thu Apr 23 08:25:04 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.20 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 9000         | 192.168.70.30 | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.00 sec)

Thu Apr 23 08:25:29 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE  | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1        | 3306 | 0         | ONLINE  | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.30 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Thu Apr 23 08:25:37 2020 proxysql> 

Below is an excerpt from proxysql log showing the last mention of 192.168.70.20 before its shutdown much later. Note that the ping thread was actively pinging that node, as confirmed by both proxysql's mysql_Server_ping_log and by inspecting node's general log.

2020-04-23 08:25:27 [INFO] Dumping mysql_servers: ALL
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid  | hostname      | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 10   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465824 |
| 11   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164466144 |
| 9000 | 192.168.70.20 | 3306 | 0    | 1000   | 3      | 0   | 1000      | 0       | 0   | 0       | mysql2  | 140610171765344 |
| 9000 | mysql1        | 3306 | 0    | 1000   | 3      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465984 |
| 9000 | 192.168.70.30 | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql3  | 140610164465664 |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
...
2020-04-23 08:27:02 MySQL_Monitor.cpp:1601:monitor_galera_thread(): [ERROR] Timeout on Galera health check for 192.168.70.20:3306 after 803ms. If the server is overload, increase mysql-monitor_galera_healthcheck_timeout.
2020-04-23 08:27:02 MySQL_Monitor.cpp:1866:monitor_galera_thread(): [ERROR] Got error. mmsd 0x7fe2560aa440 , MYSQL 0x7fe258082400 , FD 33 : timeout check
...
2020-04-23 08:27:17 [INFO] Dumping mysql_servers: ALL
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid  | hostname      | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 10   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465824 |
| 11   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164466144 |
| 9000 | 192.168.70.30 | 3306 | 0    | 1000   | 1      | 0   | 1000      | 0       | 0   | 0       | mysql3  | 140610164465664 |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
...
2020-04-23 11:43:25 MySQL_Monitor.cpp:836:monitor_ping_thread(): [ERROR] Error after 0ms on server 192.168.70.20:3306 : Lost connection to MySQL server during query

The particular issue above happened upon cluster startup, with ProxySQL up at the time. Around the startup time, another node went through SST with the .20 as a Donor.

significantly cropped mysqld error log of 192.168.70.20:

2020-04-23T08:26:15.804817Z 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-04-23T08:26:42.299124Z 2 [Note] WSREP: Setting wsrep_ready to true
2020-04-23T08:26:43.460477Z 0 [Note] WSREP: Member 1.0 (mysql3) requested state transfer from '*any*'. Selected 0.0 (mysql2)(SYNCED) as donor.
2020-04-23T08:26:43.460495Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1234383)
2020-04-23T08:26:44.919988Z 0 [Note] WSREP: (187f50d0, 'tcp://0.0.0.0:4567') turning message relay requesting off
        2020-04-23T08:26:53.894013Z WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.70.30 4444
2020-04-23T08:27:03.443849Z 16 [Note] Got an error reading communication packets
2020-04-23T08:27:04.998557Z 0 [Note] WSREP: 0.0 (mysql2): State transfer to 1.0 (mysql3) complete.
2020-04-23T08:27:04.998709Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 1234383)
2020-04-23T08:27:05.001530Z 0 [Note] WSREP: Member 0.0 (mysql2) synced with group.
2020-04-23T08:27:05.001547Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 1234383)
2020-04-23T08:27:05.001571Z 8 [Note] WSREP: Synchronized with group, ready for connections
2020-04-23T08:27:05.001582Z 8 [Note] WSREP: Setting wsrep_ready to true

I've got proxysql and node logs if needed.

After some more testing I've managed to make it lose 2 nodes at the same time, although I'm not sure how and for that one case I lost the logs :| Looking at MySQL logs this happened at cluster shutdown, however.

Thu Apr 23 13:39:54 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.30 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 11           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 12           | 192.168.70.30 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.01 sec)

Thu Apr 23 13:39:57 2020 proxysql> select * from runtime_mysql_servers;
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1   | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1   | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

And for now a minimal reproducible case is just starting proxysql up when no nodes are up. It'll randomly lose either 192.168.70.30 or 192.168.70.20 for me.

The only thing common all the time is that I can see node in OFFLINE_HARD before that node disappears. I don't think it's expected to see Galera nodes in OFFLINE_HARD anyway. For example, below is the state of my cluster with all nodes down.

Thu Apr 23 13:59:46 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 9000         | 192.168.70.30 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 9000         | mysql1        | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.20 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Attaching:

  • proxysql-status output when node's missing
  • proxysql log for a minimal reproduction case

1254.minimal.zip

All 15 comments

@bmildren I am interested ... the node that is removed is the writer from the X_mysql_servers? Or both W&R are removed?

The node is removed from both the writer and reader pools, querying the runtime_mysql_servers table, and the mysql_servers table there's no reference to the node at all.

To clarify what I mean with the removed host is always the host managed by the replication_hostgroup, basically it's always the writer (read_only = 0).

aha ok tks 馃憤 it was not clear to me ..

Hi @bmildren . Yes, log file will be helpful.
I assume the log file also includes proxysql version: if not, let me know which version are you using.
Thanks

We're running 1.4.3. The latest occurrence was down to a connectivity issues causing an OFFLINE_HARD state, but just fwiw I also have a vague recollection of occasionally coming across something like this in test when a node in the writer pool was put into OFFLINE_HARD due to the mysql-monitor_read_only_timeout being exceeded and proxysql assuming read only.

A follow up for who is following this thread.
The host was shunned on the writer hostgroup, Monitor assumed it was read_only and moved moved the host into the reader hostgroup, setting it offline in the writer hostgroup.
Due a network issue, the host become shunned also in the reader hostgroup, but then was removed because the host wasn't present anymore mysql_servers.
The reason is unknown, and #1255 will give more visibility.

This one is still causing us issues, in each case the behaviour seems to be a timeout on the read_only check immediately followed by the host being evicted.

2018-02-22 03:25:53 MySQL_Monitor.cpp:598:monitor_read_only_thread(): [ERROR] Timeout on read_only check for [ip_address]:3306 after 3006ms. If the server is overload, increase mysql
-monitor_read_only_timeout. Assuming read_only=1

20182018-02-22 03:25:53 MySQL_HostGroups_Manager.cpp:564:commit(): [WARNING] Removed server at address 139639228890496, hostgroup 101, address [ip_address] port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them

It does seem likely this is related to some sort of race condition caused by our setting writer_is_also_reader to false.

select variable_value from global_variables where variable_name = 'mysql-monitor_writer_is_also_reader';
+----------------+
| variable_value |
+----------------+
| false          |
+----------------+

Ben,
did you upgrade ProxySQL to have more visibility due to #1255 ?

Ah cool ok, I'd overlooked that made it into the main branch (I had it at the back of my mind it required a custom build) - we'll look to schedule in an upgrade 馃憤

Hi all,

I've seen a similar issue quite a few times in the last two days with ProxySQL 2.0.10. Node disappears from runtime_mysql_servers and doesn't appear until LOAD MYSQL SERVERS TO RUNTIME; is executed. I do not see an issue with mysql_servers, however.

I understand that this is an older issue and for previous version of ProxySQL, so I can create a new issue if that's more appropriate. Also, a minimal reproducible test case is, unfortunately, not losing a node when running, but rather starting up ProxySQL when no nodes are up. If you think this is a different issue, again, let me know. It's still worth looking into, probably.

ProxySQL is set up for PXC 3-node cluster, here's the configuration. mysql1 resolves to 192.168.70.10, and the discrepancy is just an artifact of configuration. I also made tests with writer_is_also_reader settings of 0 and 1, and the issue persists.

Thu Apr 23 14:00:40 2020 proxysql> select * from mysql_servers;
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 10           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 10           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Thu Apr 23 14:00:50 2020 proxysql> select * from mysql_galera_hostgroups;
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
| 10               | 12                      | 11               | 9000              | 1      | 1           | 2                     | 16                      | TEST    |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
1 row in set (0.00 sec)

And here's the expected output of runtime_mysql_servers when everything's fine.

Thu Apr 23 14:08:20 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 11           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 12           | 192.168.70.20 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 12           | 192.168.70.30 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
5 rows in set (0.01 sec)

The below is an output from my connection to admin interface just looking at how things are going (running on proxysql machine, so timestamps match). 192.168.70.20 disappeared between 08:25:29 and 08:25:37.

Thu Apr 23 08:25:04 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.20 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
| 9000         | 192.168.70.30 | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.00 sec)

Thu Apr 23 08:25:29 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE  | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1        | 3306 | 0         | ONLINE  | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.30 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Thu Apr 23 08:25:37 2020 proxysql> 

Below is an excerpt from proxysql log showing the last mention of 192.168.70.20 before its shutdown much later. Note that the ping thread was actively pinging that node, as confirmed by both proxysql's mysql_Server_ping_log and by inspecting node's general log.

2020-04-23 08:25:27 [INFO] Dumping mysql_servers: ALL
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid  | hostname      | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 10   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465824 |
| 11   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164466144 |
| 9000 | 192.168.70.20 | 3306 | 0    | 1000   | 3      | 0   | 1000      | 0       | 0   | 0       | mysql2  | 140610171765344 |
| 9000 | mysql1        | 3306 | 0    | 1000   | 3      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465984 |
| 9000 | 192.168.70.30 | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql3  | 140610164465664 |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
...
2020-04-23 08:27:02 MySQL_Monitor.cpp:1601:monitor_galera_thread(): [ERROR] Timeout on Galera health check for 192.168.70.20:3306 after 803ms. If the server is overload, increase mysql-monitor_galera_healthcheck_timeout.
2020-04-23 08:27:02 MySQL_Monitor.cpp:1866:monitor_galera_thread(): [ERROR] Got error. mmsd 0x7fe2560aa440 , MYSQL 0x7fe258082400 , FD 33 : timeout check
...
2020-04-23 08:27:17 [INFO] Dumping mysql_servers: ALL
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid  | hostname      | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 10   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164465824 |
| 11   | mysql1        | 3306 | 0    | 1000   | 0      | 0   | 1000      | 0       | 0   | 0       | mysql1  | 140610164466144 |
| 9000 | 192.168.70.30 | 3306 | 0    | 1000   | 1      | 0   | 1000      | 0       | 0   | 0       | mysql3  | 140610164465664 |
+------+---------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
...
2020-04-23 11:43:25 MySQL_Monitor.cpp:836:monitor_ping_thread(): [ERROR] Error after 0ms on server 192.168.70.20:3306 : Lost connection to MySQL server during query

The particular issue above happened upon cluster startup, with ProxySQL up at the time. Around the startup time, another node went through SST with the .20 as a Donor.

significantly cropped mysqld error log of 192.168.70.20:

2020-04-23T08:26:15.804817Z 0 [Note] /usr/sbin/mysqld: ready for connections.
2020-04-23T08:26:42.299124Z 2 [Note] WSREP: Setting wsrep_ready to true
2020-04-23T08:26:43.460477Z 0 [Note] WSREP: Member 1.0 (mysql3) requested state transfer from '*any*'. Selected 0.0 (mysql2)(SYNCED) as donor.
2020-04-23T08:26:43.460495Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1234383)
2020-04-23T08:26:44.919988Z 0 [Note] WSREP: (187f50d0, 'tcp://0.0.0.0:4567') turning message relay requesting off
        2020-04-23T08:26:53.894013Z WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.70.30 4444
2020-04-23T08:27:03.443849Z 16 [Note] Got an error reading communication packets
2020-04-23T08:27:04.998557Z 0 [Note] WSREP: 0.0 (mysql2): State transfer to 1.0 (mysql3) complete.
2020-04-23T08:27:04.998709Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 1234383)
2020-04-23T08:27:05.001530Z 0 [Note] WSREP: Member 0.0 (mysql2) synced with group.
2020-04-23T08:27:05.001547Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 1234383)
2020-04-23T08:27:05.001571Z 8 [Note] WSREP: Synchronized with group, ready for connections
2020-04-23T08:27:05.001582Z 8 [Note] WSREP: Setting wsrep_ready to true

I've got proxysql and node logs if needed.

After some more testing I've managed to make it lose 2 nodes at the same time, although I'm not sure how and for that one case I lost the logs :| Looking at MySQL logs this happened at cluster shutdown, however.

Thu Apr 23 13:39:54 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.30 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 11           | mysql1        | 3306 | 0         | ONLINE       | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 12           | 192.168.70.30 | 3306 | 0         | OFFLINE_HARD | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.01 sec)

Thu Apr 23 13:39:57 2020 proxysql> select * from runtime_mysql_servers;
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | mysql1   | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 11           | mysql1   | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
+--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

And for now a minimal reproducible case is just starting proxysql up when no nodes are up. It'll randomly lose either 192.168.70.30 or 192.168.70.20 for me.

The only thing common all the time is that I can see node in OFFLINE_HARD before that node disappears. I don't think it's expected to see Galera nodes in OFFLINE_HARD anyway. For example, below is the state of my cluster with all nodes down.

Thu Apr 23 13:59:46 2020 proxysql> select * from runtime_mysql_servers;
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | gtid_port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 9000         | 192.168.70.30 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql3  |
| 9000         | mysql1        | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql1  |
| 9000         | 192.168.70.20 | 3306 | 0         | SHUNNED | 1000   | 0           | 1000            | 0                   | 0       | 0              | mysql2  |
+--------------+---------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.01 sec)

Attaching:

  • proxysql-status output when node's missing
  • proxysql log for a minimal reproduction case

1254.minimal.zip

Hi @arronax yeah I think this warrants a new issue. I've opened #2914 to track it.

We are also experiencing the same issue with proxysql v2.0.12.

With a 3 node galera cluster, G1 - G2 and G3. When we remove G3 from the cluster and add it again, its peer G2 starts an SST transfer. During this transfer, G2 is marked as ONLINE but in the offline_hostgroup and shortly after it completly disapear from the _runtime_mysql_server_ table.

The only way to have G2 back into the runtime_mysql_server is by executing LOAD MYSQL SERVERS TO RUNTIME.

Also note that G3 came back as ONLINE in the _runtime_mysql_server_ table as soon as the SST transfer is completed

The issue is also reproduced in v2.0.13 with percona xtradb cluster

I recently ran into the same issue in v2.0.13 as well. Any plans to fix this @renecannao ?

Hi,

this issue is right now very old and the two most recent versions of ProxySQL, being v2.1.0 and v2.0.16 both pack fixes to address situations as described in the recent comments in this issue. If you experience something similar to the described in this issue with the most recent versions of ProxySQL, it's very likely to be a new different issue. If that is the case, please open a new issue with the detailed information about your case.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vy-nguyentan picture vy-nguyentan  路  4Comments

ShahriyarR picture ShahriyarR  路  3Comments

ottenhoff picture ottenhoff  路  4Comments

beebol picture beebol  路  3Comments

eightnoteight picture eightnoteight  路  4Comments