Proxysql: [Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node

Created on 28 Mar 2017  路  9Comments  路  Source: sysown/proxysql

[28.03.2017, 11:00:54] Ren茅 Canna貌: I get your point, but the current implementation shows how the node is seen by HGM
[28.03.2017, 11:01:44] Ren茅 Canna貌: a possible implementation that may make sense and doesn't create overload is to apply the same "resume from shunned" algorithm when querying the runtime_mysql_servers table

Most helpful comment

Would love for it to get UNSHUNNED when the db is healthy again

All 9 comments

Based on my testing, I would expect the status to change after X successful ping attempts. When X pings fail, the status changes to SHUNNED so why not revert the status after X successful pings?

@utdrmac : a server not responding to ping is a server that has some failure, while a server responding to ping doesn't mean that is able to process traffic. Just to mention some random examples: a server that constantly generates "table is full" or "server is in read only" or "Unknown command" (Galera) is a server that for the Monitor module is online, while for the HostGroups Manager is a faulty node.
I think it is important to highlight that Monitor and HostGroups Manager are two distinct modules.
HostGroups Manager relies on Monitor only for circumstances where HostGroups Manager cannot understand if there is a network issue and therefore Hostgroups Manager doesn't know if a backend isn't replying because it is still processing requests or because there is a network issue.

That also meaning that the "healing" algorithm used by HostGroups Manager should not depends from the Monitor module.
This is especially true for all the users that have Monitor module disabled.

That makes sense. It's just "odd" to have a node come back online and the status not reflect that it is online. Most monitoring tools, when a node come back online, the status changes.

As an admin, when the status doesn't change, it makes me, falsely, believe that there is either A) a problem with the node itself, or B) a problem with proxysql not recognizing the node is back.

Most monitoring tools, when a node come back online, the status changes.

I agree, in fact the server is online in monitor tables, right?

I think it is important to understand which status "doesn't change" .
As pointed in #984 , the fact that currently is reported as SHUNNED it just means that the last time it was used was SHUNNED.
That means that HostGroups Manager didn't change the status of the node because no traffic was sent to that node, so it is correct: the last known status for the HostGroups Manager isn't changed.

To make an example excluding the proxy: application connects directly to the DB. If the application get an error after the DB goes down, if the application doesn't retry, the last known status is not online (even if the backend is online).

So if status doesn't change, it also means that no traffic is passing from the proxy, otherwise the status changes again.
The patch suggested in this issue is to avoid that admins get confused in low traffic environments (for example, while building PoC), because on busy systems this confusion shouldn't be present.

I agree @utdrmac suggest,We have the same problem in production.
At first I thought I had a problem with the database environment,But when I checked my database configuration and proxysql configuration.
We try to set up the mysql_replication_hostgroups, master database from SHUNNED to online status,
So,SHUNNED need traffic to online status, but SHUNNED state can't through the SQL statement.
Actually, this design is very misleading, it is hard to understand.

I'm using https://github.com/MaxFedotov/proxysql-zabbix/ template with backend status check
If backend has no activity, his status changed to SHUNNED and zabbix send alert on every check.
"Proxysql backend MySQL server 10.0.0.1:3306 is not ONLINE"

It's ok if proxysql node has no activity (as backup node for example)

Would love for it to get UNSHUNNED when the db is healthy again

I have seen the same today. A host that is in three hostgroups got "ONLINE" again in two hostgroups, but not the third.
Maybe some logic like (pseudocode):

If(host.statechange (SHUNNED => ONLINE)){
   SELECT host, hostgroup WHERE host=host AND status="SHUNNED"
   checkIfBackOnline(host,hostgroup);
}
Was this page helpful?
0 / 5 - 0 ratings

Related issues

nielsalkema picture nielsalkema  路  22Comments

tapuhi picture tapuhi  路  17Comments

ethaniel picture ethaniel  路  18Comments

renecannao picture renecannao  路  20Comments

ayder picture ayder  路  76Comments