Proxysql: Aurora primary SHUNNED for nonexistent lag

Created on 15 Oct 2020 · 5Comments · Source: sysown/proxysql

Hey Team! I found an issue during a simple configuration with Aurora Cluster.
The scenario:

1 proxySQL
1 AWS Aurora MySQL Cluster
1 primary
1 reader
binary log enabled

When I connect all together, I keep receiving this error on proxySQL error logs:

2020-10-15 16:12:37 [INFO] MySQL_HostGroups_Manager::commit() locked for 3ms
2020-10-15 16:12:37 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 100 , address: gcp-clickfunnels-staging.c4gxn5pmgmjd.us-east-1.rds.amazonaws.com , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 50 , use_ssl: 0 , max_latency_ms: 15000000 , comment: 
HID: 200 , address: gcp-mothership-stg-db-01.c4gxn5pmgmjd.us-east-1.rds.amazonaws.com , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 50 , use_ssl: 0 , max_latency_ms: 15000000 , comment: 
2020-10-15 16:12:37 [INFO] Dumping mysql_servers: ALL
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid | hostname                                                          | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 100 | xxx.us-east-1.rds.amazonaws.com | 3306 | 0    | 1      | 0      | 0   | 1000      | 50      | 0   | 15      |         | 139861648882272 |
| 200 | xxx.us-east-1.rds.amazonaws.com | 3306 | 0    | 1      | 0      | 0   | 1000      | 50      | 0   | 15      |         | 139861577245152 |
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
2020-10-15 16:12:37 [INFO] Received SAVE MYSQL SERVERS TO DISK command

```2020-10-15 17:02:34 MySQL_HostGroups_Manager.cpp:5503:aws_aurora_replication_lag_action(): [WARNING] Shunning server xxx.us-east-1.rds.amazonaws.com:3306 from HG 100 with replication lag of 2147483648.000000 microseconds
2020-10-15 17:02:43 MySQL_HostGroups_Manager.cpp:2891:replication_lag_action(): [WARNING] Re-enabling server xxx.us-east-1.rds.amazonaws.com:3306 from HG 100 with replication lag of -2 second

prod ProxySQL> select * from mysql_aws_aurora_hostgroupsG
******** 1. row ********
writer_hostgroup: 100
reader_hostgroup: 200
active: 1
aurora_port: 3306
domain_name: .xxx.us-east-1.rds.amazonaws.com
max_lag_ms: 20
check_interval_ms: 100
check_timeout_ms: 80
writer_is_also_reader: 0
new_reader_weight: 1
add_lag_ms: 25
min_lag_ms: 5
lag_num_checks: 1
comment: Aurora Cluster
```

No errors on mysql_server_aws_aurora_log table.
Tried with and without add_lag_ms but still an issue

ProxySQL ver 2.0.14
Ubuntu package

bug

Source

jtomaszon

Most helpful comment

We are seeing this on new clusters launched with the 5.6.mysql_aurora.1.23.0 engine version.

bangpound on 23 Oct 2020

👍2

All 5 comments

As a workaround, deleting mysql_aws_aurora_hostgroups and just using normal mysql_replication_hostgroups keep the things running.

jtomaszon on 15 Oct 2020

We are seeing this on new clusters launched with the 5.6.mysql_aurora.1.23.0 engine version.

bangpound on 23 Oct 2020

👍2

Como solução alternativa, excluir mysql_aws_aurora_hostgroupse usar apenas o normal mysql_replication_hostgroupsmantém as coisas funcionando.

But how can I control the lag of the aurora cluster replicas

whera on 9 Dec 2020

Como solução alternativa, excluir mysql_aws_aurora_hostgroupse usar apenas o normal mysql_replication_hostgroupsmantém as coisas funcionando.

But how can I control the lag of the aurora cluster replicas

Sadly you won't be able to.. The key here is, Aurora replicas won't have lag more than a tens of ms, from my experience (heavy write batches application) we never see that behavior. Max Lag will be around 30-50ms. If your application supports that kind of lag, you should be fine to keep using old replication group

Just a disclosure, it's me and my experience saying that, not a recommendation from AWS or even ProxySQL team

jtomaszon on 10 Dec 2020

tks!! @jtomaszon

whera on 11 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings