Proxysql: Documentation missing for AWS Aurora discovery in 2.0.7

Created on 16 Oct 2019  路  7Comments  路  Source: sysown/proxysql

Hi,
with version 2.0.7, native support for AWS Aurora has been added. However documentation for it and its table (mysql_aws_aurora_hostgroups) is missing.
Hope this can be added soon.
Thanks!

Most helpful comment

@markuman The two are separate but related features:

  • mysql_replication_hostgroups is part of an older feature (v1.1.1, #395) that monitors the read_only status of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitor innodb_read_only to support Aurora clusters for automatic failover detection.
  • mysql_aws_aurora_hostgroups is part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.

After testing out this new feature for a bit, here's my summary of how it works:

Table structure:

https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/ProxySQL_Admin.cpp#L362

CREATE TABLE mysql_aws_aurora_hostgroups (
  writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , 
  reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , 
  active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , 
  aurora_port INT NOT NUlL DEFAULT 3306 , 
  domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.') , 
  max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000 , 
  check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , 
  check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , 
  writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , 
  new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1 , 
  comment VARCHAR , 
UNIQUE (reader_hostgroup))

Most of these have reasonable defaults, only a few are required:

  • writer_hostgroup: hostgroup ID of the writer instance
  • reader_hostgroup: hostgroup ID of reader instances
  • domain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like .abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like [instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).

To turn on Aurora auto-discovery, INSERT your configuration into mysql_aws_aurora_hostgroups (and mysql_replication_hostgroups, for failover detection) and run LOAD MYSQL SERVERS TO RUNTIME like usual, e.g.:

    INSERT INTO mysql_aws_aurora_hostgroups (writer_hostgroup, reader_hostgroup, domain_name)
      VALUES (0, 1, '.abcdefghijklm.us-east-1.rds.amazonaws.com');
    INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type)
      VALUES (0, 1, 'innodb_read_only');
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;

When the feature is active, a monitor-thread polls the writer-hostgroup using the following query: https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/MySQL_Monitor.cpp#L3987

SELECT SERVER_ID,
  SESSION_ID,
  LAST_UPDATE_TIMESTAMP,
  REPLICA_LAG_IN_MILLISECONDS,
  CPU 
FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS
WHERE 
  REPLICA_LAG_IN_MILLISECONDS > 0 OR 
  SESSION_ID = 'MASTER_SESSION_ID' 
ORDER BY SERVER_ID

The SERVER_ID returned are the IDs of the instances, which are appended to the configured domain_name to add servers into the mysql_servers table.

Current limitations/issues related to this feature I experienced during testing:

  • When loading mysql_aws_aurora_hostgroups from configuration file, there is a bug (typo) where the new_reader_weight defaults to 0 instead of 1. Workaround is to set this value to 1 in config.
  • No way to configure max_connections less than 1000 on auto-discovered servers (this minimum seems to be hard-coded)
  • Auto-discovery didn't seem to properly filter out deleted instances in the cluster. In my test, a deleted read-replica instance continued to show up in the REPLICA_HOST_STATUS table but with a very high REPLICA_LAG_IN_MILLISECONDS (900000), causing it to stay in the mysql_servers list (with SHUNNED status, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.

All 7 comments

I think all the magic comes from mysql_replication_hostgroup: https://github.com/sysown/proxysql/wiki/Main-(runtime)#mysql_replication_hostgroups

For AWS Aurora check_type = 'innodb_read_only' should be used.

@markuman The two are separate but related features:

  • mysql_replication_hostgroups is part of an older feature (v1.1.1, #395) that monitors the read_only status of listed servers and migrates them to reader/writer hostgroups based on the current status. More recently (v2.0.1, 1b7b6f08fde96ebd1631f3cfd4ce11ce3a07cfc9) this feature was updated to optionally monitor innodb_read_only to support Aurora clusters for automatic failover detection.
  • mysql_aws_aurora_hostgroups is part of a newer feature 'auto discovery for AWS Aurora', just released in v2.0.7. Though mentioned as a 'new feature' in this release, it does not have any public documentation at the moment.

After testing out this new feature for a bit, here's my summary of how it works:

Table structure:

https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/ProxySQL_Admin.cpp#L362

CREATE TABLE mysql_aws_aurora_hostgroups (
  writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , 
  reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , 
  active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , 
  aurora_port INT NOT NUlL DEFAULT 3306 , 
  domain_name VARCHAR NOT NULL CHECK (SUBSTR(domain_name,1,1) = '.') , 
  max_lag_ms INT NOT NULL CHECK (max_lag_ms>= 10 AND max_lag_ms <= 600000) DEFAULT 600000 , 
  check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , 
  check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , 
  writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , 
  new_reader_weight INT CHECK (new_reader_weight >= 0 AND new_reader_weight <=10000000) NOT NULL DEFAULT 1 , 
  comment VARCHAR , 
UNIQUE (reader_hostgroup))

Most of these have reasonable defaults, only a few are required:

  • writer_hostgroup: hostgroup ID of the writer instance
  • reader_hostgroup: hostgroup ID of reader instances
  • domain_name: Domain-name suffix of the Aurora instances in your cluster. This depends on your AWS account and will look something like .abcdefghijklm.us-east-1.rds.amazonaws.com, where your instance endpoints are something like [instance-id].abcdefghijklm.us-east-1.rds.amazonaws.com. This field is used to build new server addresses from the instance ID (which is all the info Aurora gives in its internal replication tables).

To turn on Aurora auto-discovery, INSERT your configuration into mysql_aws_aurora_hostgroups (and mysql_replication_hostgroups, for failover detection) and run LOAD MYSQL SERVERS TO RUNTIME like usual, e.g.:

    INSERT INTO mysql_aws_aurora_hostgroups (writer_hostgroup, reader_hostgroup, domain_name)
      VALUES (0, 1, '.abcdefghijklm.us-east-1.rds.amazonaws.com');
    INSERT INTO mysql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type)
      VALUES (0, 1, 'innodb_read_only');
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;

When the feature is active, a monitor-thread polls the writer-hostgroup using the following query: https://github.com/sysown/proxysql/blob/757684c6d9a3a3574359725c0663f7d3cba0a041/lib/MySQL_Monitor.cpp#L3987

SELECT SERVER_ID,
  SESSION_ID,
  LAST_UPDATE_TIMESTAMP,
  REPLICA_LAG_IN_MILLISECONDS,
  CPU 
FROM INFORMATION_SCHEMA.REPLICA_HOST_STATUS
WHERE 
  REPLICA_LAG_IN_MILLISECONDS > 0 OR 
  SESSION_ID = 'MASTER_SESSION_ID' 
ORDER BY SERVER_ID

The SERVER_ID returned are the IDs of the instances, which are appended to the configured domain_name to add servers into the mysql_servers table.

Current limitations/issues related to this feature I experienced during testing:

  • When loading mysql_aws_aurora_hostgroups from configuration file, there is a bug (typo) where the new_reader_weight defaults to 0 instead of 1. Workaround is to set this value to 1 in config.
  • No way to configure max_connections less than 1000 on auto-discovered servers (this minimum seems to be hard-coded)
  • Auto-discovery didn't seem to properly filter out deleted instances in the cluster. In my test, a deleted read-replica instance continued to show up in the REPLICA_HOST_STATUS table but with a very high REPLICA_LAG_IN_MILLISECONDS (900000), causing it to stay in the mysql_servers list (with SHUNNED status, and constant monitor connection-failure errors in the logs). These instances should be filtered out by the monitor query.

@wjordan Your helpful information is a great start on documentation. However, it doesn't appear to auto-detect any aurora hosts when I followed your instructions.

Using:
ProxySQL version 2.0.8-67-g877cab1e, codename Truls

I have a mysql_user setup and I can connect to the Aurora cluster with it from where proxysql is running. I see Loading AWS Aurora info for (0,1,on,3306,".abcdefghijkl.us-east-1.rds.amazonaws.com",600000,1000,800,"(null)") in the proxysql logs and nothing else. Nothing loaded into the (runtime_)mysql_servers table or otherwise that I've seen. I am new to using proxysql, so I'm most likely missing a step or other crucial piece of information.

A reply, some official documentation, or a blog post on this feature would be helpful.

To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in mysql_servers and have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion. Such as auto-scale readers. If that's incorrect, so be it, but I'm failing to understand how mysql_aws_aurora_hostgroups helps when auto-scale Aurora instances are commonplace if you need to add each node or ID to the mysql_servers table.

All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to mysql_servers was enough to have auto-detection begin working.

This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.

To be more clear. I understand I can add the Aurora nodes or the RW/RO endpoints in mysql_servers and have things work for the most part, but this feature sounded like it would auto-detect nodes in some fashion.

I have noticed this too, both with 2.0.10 and 2.0.11. Auto-discovery does not appear to discover anything until I have at least one server defined in mysql_servers. In case it makes any difference, we are running in Kubernetes and all configuration is loaded via the config file at startup and not through the admin interface.

I also found that with ProxySQL 2.0.12 we can configure mysql_servers with instance endpoints or cluster/endpoin, which can be used to find the Aurora cluster topology. But when a failover happens, it will not change the host group of the records in mysql_servers. Also, new added replica will not be appended in mysql_servers.

All of that and it suddenly began to work. The only thing I noticed was a crash + restart way back in my logs and it's entirely possible I didn't save some part of the config changes to disk and in the course of re-doing things a few times over the starts aligned and adding one node to mysql_servers was enough to have auto-detection begin working.

This does bring up the question of how to recover from crashes if Aurora instances are added/removed. Would you use the scheduler to save servers to disk regularly? I can't see a good way to prevent a disaster if proxysql crashes with old nodes stored on disk due to a previous failover that was not saved to disk.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

renecannao picture renecannao  路  20Comments

lengerad picture lengerad  路  23Comments

jkklee picture jkklee  路  29Comments

andreygolev picture andreygolev  路  20Comments

maximumG picture maximumG  路  21Comments