Hi,
with version 2.0.7, native support for AWS Aurora has been added. However documentation for it and its table (mysql_aws_aurora_hostgroups) is missing.
Hope this can be added soon.
Thanks!
I think all the magic comes from mysql_replication_hostgroup: https://github.com/sysown/proxysql/wiki/Main-(runtime)#mysql_replication_hostgroups
For AWS Aurora
check_type = 'innodb_read_only'should be used.
@markuman The two are separate but related features:
mysql_replication_hostgroups is part of an older feature (v1.1.1, #395) that monitors the read_only status of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitor innodb_read_only to support Aurora clusters for automatic failover detection.mysql_aws_aurora_hostgroups is part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.After testing out this new feature for a bit, here's my summary of how it works:
Table structure:
CREATE TABLE mysql_aws_aurora_hostgroups (
writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY ,
reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) ,
active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 ,
aurora_port INT NOT NUlL DEFAULT 3306 ,
domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.') ,
max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000 ,
check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 ,
check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 ,
writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 ,
new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1 ,
comment VARCHAR ,
UNIQUE (reader_hostgroup))
Most of these have reasonable defaults, only a few are required:
writer_hostgroup: hostgroup ID of the writer instancereader_hostgroup: hostgroup ID of reader instancesdomain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like .abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like [instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).To turn on Aurora auto-discovery, INSERT your configuration into mysql_aws_aurora_hostgroups (and mysql_replication_hostgroups, for failover detection) and run LOAD MYSQL SERVERS TO RUNTIME like usual, e.g.:
INSERT INTO mysql_aws_aurora_hostgroups (writer_hostgroup, reader_hostgroup, domain_name)
VALUES (0, 1, '.abcdefghijklm.us-east-1.rds.amazonaws.com');
INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type)
VALUES (0, 1, 'innodb_read_only');
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
When the feature is active, a monitor-thread polls the writer-hostgroup using the following query: https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/MySQL_Monitor.cpp#L3987
SELECT SERVER_ID,
SESSION_ID,
LAST_UPDATE_TIMESTAMP,
REPLICA_LAG_IN_MILLISECONDS,
CPU
FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS
WHERE
REPLICA_LAG_IN_MILLISECONDS > 0 OR
SESSION_ID = 'MASTER_SESSION_ID'
ORDER BY SERVER_ID
The SERVER_ID returned are the IDs of the instances, which are appended to the configured domain_name to add servers into the mysql_servers table.
Current limitations/issues related to this feature I experienced during testing:
mysql_aws_aurora_hostgroups from configuration file, there is a bug (typo) where the new_reader_weight defaults to 0 instead of 1. Workaround is to set this value to 1 in config.max_connections less than 1000 on auto-discovered servers (this minimum seems to be hard-coded)REPLICA_HOST_STATUS table but with a very high REPLICA_LAG_IN_MILLISECONDS (900000), causing it to stay in the mysql_servers list (with SHUNNED status, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.@wjordan Your helpful information is a great start on documentation. However, it doesn't appear to auto-detect any aurora hosts when I followed your instructions.
Using:
ProxySQL version 2.0.8-67-g877cab1e, codename Truls
I have a mysql_user setup and I can connect to the Aurora cluster with it from where proxysql is running. I see Loading AWS Aurora info for (0,1,on,3306,".abcdefghijkl.us-east-1.rds.amazonaws.com",600000,1000,800,"(null)") in the proxysql logs and nothing else. Nothing loaded into the (runtime_)mysql_servers table or otherwise that I've seen. I am new to using proxysql, so I'm most likely missing a step or other crucial piece of information.
A reply, some official documentation, or a blog post on this feature would be helpful.
To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in mysql_servers and have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion. Such as auto-scale readers. If that's incorrect, so be it, but I'm failing to understand how mysql_aws_aurora_hostgroups helps when auto-scale Aurora instances are commonplace if you need to add each node or ID to the mysql_servers table.
All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to mysql_servers was enough to have auto-detection begin working.
This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.
To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in
mysql_serversand have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion.
I have noticed this too, both with 2.0.10 and 2.0.11. Auto-discovery does not appear to discover anything until I have at least one server defined in mysql_servers. In case it makes any difference, we are running in Kubernetes and all configuration is loaded via the config file at startup and not through the admin interface.
I also found that with ProxySQL 2.0.12 we can configure mysql_servers with instance endpoints or cluster/endpoin, which can be used to find the Aurora cluster topology. But when a failover happens, it will not change the host group of the records in mysql_servers. Also, new added replica will not be appended in mysql_servers.
All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to
mysql_serverswas enough to have auto-detection begin working.This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.
Most helpful comment
@markuman The two are separate but related features:
mysql_replication_hostgroupsis part of an older feature (v1.1.1, #395) that monitors theread_onlystatus of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitorinnodb_read_onlyto support Aurora clusters for automatic failover detection.mysql_aws_aurora_hostgroupsis part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.After testing out this new feature for a bit, here's my summary of how it works:
Table structure:
https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/ProxySQL_Admin.cpp#L362
Most of these have reasonable defaults, only a few are required:
writer_hostgroup: hostgroup ID of the writer instancereader_hostgroup: hostgroup ID of reader instancesdomain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like.abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like[instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).To turn on Aurora auto-discovery,
INSERTyour configuration intomysql_aws_aurora_hostgroups(andmysql_replication_hostgroups, for failover detection) and runLOAD MYSQL SERVERS TO RUNTIMElike usual, e.g.:When the feature is active, a monitor-thread polls the writer-hostgroup using the following query: https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/MySQL_Monitor.cpp#L3987
The
SERVER_IDreturned are the IDs of the instances, which are appended to the configureddomain_nameto add servers into themysql_serverstable.Current limitations/issues related to this feature I experienced during testing:
mysql_aws_aurora_hostgroupsfrom configuration file, there is a bug (typo) where thenew_reader_weightdefaults to0instead of1. Workaround is to set this value to1in config.max_connectionsless than1000on auto-discovered servers (this minimum seems to be hard-coded)REPLICA_HOST_STATUStable but with a very highREPLICA_LAG_IN_MILLISECONDS(900000), causing it to stay in themysql_serverslist (withSHUNNEDstatus, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.