Proxysql: Documentation missing for AWS Aurora discovery in 2.0.7

Created on 16 Oct 2019 · 7Comments · Source: sysown/proxysql

Hi,
with version 2.0.7, native support for AWS Aurora has been added. However documentation for it and its table (mysql_aws_aurora_hostgroups) is missing.
Hope this can be added soon.
Thanks!

Source

Lightning303

👍1

Most helpful comment

@markuman The two are separate but related features:

mysql_replication_hostgroups is part of an older feature (v1.1.1, #395) that monitors the read_only status of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitor innodb_read_only to support Aurora clusters for automatic failover detection.
mysql_aws_aurora_hostgroups is part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.

After testing out this new feature for a bit, here's my summary of how it works:

Table structure:

https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/ProxySQL_Admin.cpp#L362

CREATE TABLE mysql_aws_aurora_hostgroups (
  writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , 
  reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , 
  active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , 
  aurora_port INT NOT NUlL DEFAULT 3306 , 
  domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.') , 
  max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000 , 
  check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , 
  check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , 
  writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , 
  new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1 , 
  comment VARCHAR , 
UNIQUE (reader_hostgroup))

Most of these have reasonable defaults, only a few are required:

writer_hostgroup: hostgroup ID of the writer instance
reader_hostgroup: hostgroup ID of reader instances
domain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like .abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like [instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).

To turn on Aurora auto-discovery, INSERT your configuration into mysql_aws_aurora_hostgroups (and mysql_replication_hostgroups, for failover detection) and run LOAD MYSQL SERVERS TO RUNTIME like usual, e.g.:

    INSERT INTO mysql_aws_aurora_hostgroups (writer_hostgroup, reader_hostgroup, domain_name)
      VALUES (0, 1, '.abcdefghijklm.us-east-1.rds.amazonaws.com');
    INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type)
      VALUES (0, 1, 'innodb_read_only');
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;

When the feature is active, a monitor-thread polls the writer-hostgroup using the following query: https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/MySQL_Monitor.cpp#L3987

SELECT SERVER_ID,
  SESSION_ID,
  LAST_UPDATE_TIMESTAMP,
  REPLICA_LAG_IN_MILLISECONDS,
  CPU 
FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS
WHERE 
  REPLICA_LAG_IN_MILLISECONDS > 0 OR 
  SESSION_ID = 'MASTER_SESSION_ID' 
ORDER BY SERVER_ID

The SERVER_ID returned are the IDs of the instances, which are appended to the configured domain_name to add servers into the mysql_servers table.

Current limitations/issues related to this feature I experienced during testing:

When loading mysql_aws_aurora_hostgroups from configuration file, there is a bug (typo) where the new_reader_weight defaults to 0 instead of 1. Workaround is to set this value to 1 in config.
No way to configure max_connections less than 1000 on auto-discovered servers (this minimum seems to be hard-coded)
Auto-discovery didn't seem to properly filter out deleted instances in the cluster. In my test, a deleted read-replica instance continued to show up in the REPLICA_HOST_STATUS table but with a very high REPLICA_LAG_IN_MILLISECONDS (900000), causing it to stay in the mysql_servers list (with SHUNNED status, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.

wjordan on 18 Oct 2019

👍6

All 7 comments

I think all the magic comes from mysql_replication_hostgroup: https://github.com/sysown/proxysql/wiki/Main-(runtime)#mysql_replication_hostgroups

For AWS Aurora check_type = 'innodb_read_only' should be used.

markuman on 17 Oct 2019

@markuman The two are separate but related features:

mysql_replication_hostgroups is part of an older feature (v1.1.1, #395) that monitors the read_only status of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitor innodb_read_only to support Aurora clusters for automatic failover detection.
mysql_aws_aurora_hostgroups is part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.

After testing out this new feature for a bit, here's my summary of how it works:

Table structure:

https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/ProxySQL_Admin.cpp#L362

CREATE TABLE mysql_aws_aurora_hostgroups (
  writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , 
  reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , 
  active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , 
  aurora_port INT NOT NUlL DEFAULT 3306 , 
  domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.') , 
  max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000 , 
  check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , 
  check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , 
  writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , 
  new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1 , 
  comment VARCHAR , 
UNIQUE (reader_hostgroup))

Most of these have reasonable defaults, only a few are required:

writer_hostgroup: hostgroup ID of the writer instance
reader_hostgroup: hostgroup ID of reader instances
domain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like .abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like [instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).

    INSERT INTO mysql_aws_aurora_hostgroups (writer_hostgroup, reader_hostgroup, domain_name)
      VALUES (0, 1, '.abcdefghijklm.us-east-1.rds.amazonaws.com');
    INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type)
      VALUES (0, 1, 'innodb_read_only');
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;

SELECT SERVER_ID,
  SESSION_ID,
  LAST_UPDATE_TIMESTAMP,
  REPLICA_LAG_IN_MILLISECONDS,
  CPU 
FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS
WHERE 
  REPLICA_LAG_IN_MILLISECONDS > 0 OR 
  SESSION_ID = 'MASTER_SESSION_ID' 
ORDER BY SERVER_ID

The SERVER_ID returned are the IDs of the instances, which are appended to the configured domain_name to add servers into the mysql_servers table.

Current limitations/issues related to this feature I experienced during testing:

When loading mysql_aws_aurora_hostgroups from configuration file, there is a bug (typo) where the new_reader_weight defaults to 0 instead of 1. Workaround is to set this value to 1 in config.
No way to configure max_connections less than 1000 on auto-discovered servers (this minimum seems to be hard-coded)
Auto-discovery didn't seem to properly filter out deleted instances in the cluster. In my test, a deleted read-replica instance continued to show up in the REPLICA_HOST_STATUS table but with a very high REPLICA_LAG_IN_MILLISECONDS (900000), causing it to stay in the mysql_servers list (with SHUNNED status, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.

wjordan on 18 Oct 2019

👍6

@wjordan Your helpful information is a great start on documentation. However, it doesn't appear to auto-detect any aurora hosts when I followed your instructions.

Using:
ProxySQL version 2.0.8-67-g877cab1e, codename Truls

I have a mysql_user setup and I can connect to the Aurora cluster with it from where proxysql is running. I see Loading AWS Aurora info for (0,1,on,3306,".abcdefghijkl.us-east-1.rds.amazonaws.com",600000,1000,800,"(null)") in the proxysql logs and nothing else. Nothing loaded into the (runtime_)mysql_servers table or otherwise that I've seen. I am new to using proxysql, so I'm most likely missing a step or other crucial piece of information.

A reply, some official documentation, or a blog post on this feature would be helpful.

bryanspears on 13 Dec 2019

To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in mysql_servers and have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion. Such as auto-scale readers. If that's incorrect, so be it, but I'm failing to understand how mysql_aws_aurora_hostgroups helps when auto-scale Aurora instances are commonplace if you need to add each node or ID to the mysql_servers table.

bryanspears on 13 Dec 2019

All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to mysql_servers was enough to have auto-detection begin working.

This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.

bryanspears on 13 Dec 2019

To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in mysql_servers and have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion.

I have noticed this too, both with 2.0.10 and 2.0.11. Auto-discovery does not appear to discover anything until I have at least one server defined in mysql_servers. In case it makes any difference, we are running in Kubernetes and all configuration is loaded via the config file at startup and not through the admin interface.

bryonwinger on 14 May 2020

I also found that with ProxySQL 2.0.12 we can configure mysql_servers with instance endpoints or cluster/endpoin, which can be used to find the Aurora cluster topology. But when a failover happens, it will not change the host group of the records in mysql_servers. Also, new added replica will not be appended in mysql_servers.

All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to mysql_servers was enough to have auto-detection begin working.

This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.