We have some automation around DMS pipelines for a nightly dump of our databases. A small percentage of DMS tasks fail to start because the initial endpoint connection test fails. I'm attempting to handle this issue and re-test the connection before starting the task.
DatabaseMigrationService.Waiter.TestConnectionSucceeds waits for a connection test in progress to succeeds and then returns
Errors out immediately with:
Waiter TestConnectionSucceeds failed: Connection is already being tested: WaiterError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 64, in lambda_handler
waiter.wait(ReplicationInstanceArn=replication_instance_arn, EndpointArn=source_endpoint_arn)
File "/var/runtime/botocore/waiter.py", line 53, in wait
Waiter.wait(self, **kwargs)
File "/var/runtime/botocore/waiter.py", line 313, in wait
last_response=response
botocore.exceptions.WaiterError: Waiter TestConnectionSucceeds failed: Connection is already being tested
I'm not too familiar with how boto waiters work, but it seems like this might just be calling the dms test-connection api, which returns the same error after it's called a second time (while the connection is still testing). I've reproduced a similar error by using the CLI directly:
$ aws dms test-connection --replication-instance-arn arn:aws:dms:us-east-1:*****:rep:65LSNAJCV7QHFPNWAZUHZ5DNHQ --endpoint-arn arn:aws:dms:us-east-1:*****:endpoint:TNS6FYCD4JYFMNUYLI2OCQJMPI
{
"Connection": {
"ReplicationInstanceArn": "arn:aws:dms:us-east-1:*****:rep:65LSNAJCV7QHFPNWAZUHZ5DNHQ",
"EndpointArn": "arn:aws:dms:us-east-1:*****:endpoint:TNS6FYCD4JYFMNUYLI2OCQJMPI",
"Status": "testing",
"EndpointIdentifier": "datatruck-scylla-nextaccounting-shards-read-replica-02-0116",
"ReplicationInstanceIdentifier": "datatruck-scylla-next-accounting-shard-0116"
}
}
$ aws dms test-connection --replication-instance-arn arn:aws:dms:us-east-1:*****:rep:65LSNAJCV7QHFPNWAZUHZ5DNHQ --endpoint-arn arn:aws:dms:us-east-1:*****:endpoint:TNS6FYCD4JYFMNUYLI2OCQJMPI
An error occurred (InvalidResourceStateFault) when calling the TestConnection operation: Connection is already being tested
Here's the relevant part of my python code:
replication_task = response['ReplicationTasks'][0]
replication_task_arn = replication_task['ReplicationTaskArn']
source_endpoint_arn = replication_task['SourceEndpointArn']
target_endpoint_arn = replication_task['TargetEndpointArn']
replication_instance_arn = replication_task['ReplicationInstanceArn']
logger.info(f"Testing connection between replication instance ${replication_instance_arn} and endpoint ${source_endpoint_arn}")
logger.info("Waiting for successful connection...")
waiter = client.get_waiter('test_connection_succeeds')
waiter.wait(ReplicationInstanceArn=replication_instance_arn, EndpointArn=source_endpoint_arn)
logger.info(f"Starting replication task '{replication_task_arn}' from source {source_endpoint_arn} to target {target_endpoint_arn} on {replication_instance_arn}")
The CLI appears to be broken in the same way:
$ aws dms wait test-connection-succeeds --replication-instance-arn arn:aws:dms:us-east-1:***:rep:65LSNAJCV7QHFPNWAZUHZ5DNHQ --endpoint-arn arn:aws:dms:us-east-1:***:endpoint:TNS6FYCD4JYFMNUYLI2OCQJMPI
Waiter TestConnectionSucceeds failed: Connection is already being tested
@mwarkentin Thanks for the report. Definitions of waiters are shared between the Python SDK and the AWS CLI (and all of our SDKs, for that matter). I can confirm that this waiter is broken for the reason you describe. We're working on getting this fixed. Labeling as a bug for now.
A fix for this was pushed in yesterday's (11/7/2018) release.
As of botocore v1.12.40, boto3 v1.9.40, and aws-cli v1.16.40 this waiter should function correctly.
Thanks, I鈥檒l test it out soon!
I'm running into a similar issue with the the dms ReplicationTaskStopped waiter for both the AWS CLI and boto3.
I am running the following versions:
boto3 (1.9.127)
botocore (1.12.127)
aws-cli/1.16.135
I always get the error "Waiter ReplicationTaskStopped failed: Waiter encountered a terminal failure state" unless the task is already in the "stopped" state. It returns this error even if the task is in the "starting" or "running" state which doesn't seem correct.
Most helpful comment
I'm running into a similar issue with the the dms ReplicationTaskStopped waiter for both the AWS CLI and boto3.
I am running the following versions:
boto3 (1.9.127)
botocore (1.12.127)
aws-cli/1.16.135
I always get the error "Waiter ReplicationTaskStopped failed: Waiter encountered a terminal failure state" unless the task is already in the "stopped" state. It returns this error even if the task is in the "starting" or "running" state which doesn't seem correct.