kops will keep retying to update cluster when AWS hits launchconfigurations limit

Created on 4 Dec 2016  路  14Comments  路  Source: kubernetes/kops

How to reproduce:

Create launchconfigurations until you reach the limit for that aws account and try to upgrade a kubernetes cluster with kops, it will keep is this loop:

...

I1204 16:03:16.701326   45183 aws_cloud.go:570] Resolved image "ami-4bb3e05c"
I1204 16:03:16.702746   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:16.941540   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.092591   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.755397   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.863162   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.537845   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.690481   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.313920   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.432232   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 705ms
I1204 16:03:20.034683   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 604ms
I1204 16:03:20.139128   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.640814   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.972542   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.427645   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.758880   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.212730   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.480970   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 655ms
I1204 16:03:23.046056   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.139135   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.839591   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.920233   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.627519   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.695294   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.395123   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.401989   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 732ms
I1204 16:03:26.115919   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 889ms
I1204 16:03:26.138282   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:26.922574   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.010358   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.721469   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.802537   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.516712   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.612808   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.271350   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.441639   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.075490   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.212078   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 709ms

...

I don't think this is a retryable error though

Regards

lifecyclrotten

Most helpful comment

@amadev I'm running under v1.17.0 and I sitll see A LOT of Got RequestLimitExceeded error on AWS request during my updates. 3 minutes or more, for me isn't fixed.

All 14 comments

I am not sure if we actually have the notion of a non-recoverable error (or trust AWS that when it says something is permanent that it genuinely is). But we can have a look, or at least provide a hint if we encounter the error :-)

It would be very handy if we could use IAM profiling, and a dry run to verify if AWS will even allow us to create a cluster.

We could check things like limits on resources, permissions, etc

I think root cause here is #329

@justinsb I would sorta agree. We are not cleaning up, so yes that is a problem. But the call to great a new launch config, when your account cannot create another, loops. I have seen kops hang with quota issues. We can run into this if you have a bunch of clusters and hit your quota for launch configs.

Wondering how much of this is related to https://github.com/kubernetes/kops/issues/1051

Can we test and close if so?

Sort of. We still will loop to eternity when we hit certain limits. We are not timing out properly somewhere.

In kops 1.5.0 we have much clearer logging for errors during retries: #1658

I think we should consider the idea of retryable errors, but it isn't clear when errors are retryable. If another cluster is being deleted, resources may become available.

@justinsb since we put #1658 in place I have seen that we seem to be hammering the API pretty hard on deletes. Not sure if this is expected?

For example deleting a private topo cluster with HA masters. I am on master

I0216 12:00:38.022397   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 97663fe8-8cff-421f-b009-5cda8131b7e2) from ec2/DeleteSubnet - will retry after delay of 5.072s
I0216 12:00:38.258773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 7531b2af-9fce-48a0-9fd4-6d008074029f) from ec2/DeleteSubnet - will retry after delay of 6.432s
I0216 12:00:38.509773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 0ded66e4-96da-4e4d-bf7d-53f4b2cadb6f) from ec2/DeleteVolume - will retry after delay of 4.664s
I0216 12:00:38.582814   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 489434b5-5ae8-4ac3-8a84-c2158abb3ca3) from ec2/DeleteVolume - will retry after delay of 4.848s
I0216 12:00:38.595522   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: e6a789a9-4ee8-4d63-936c-6bb01ce7d133) from ec2/DetachInternetGateway - will retry after delay of 6.96s
I0216 12:00:38.663642   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: d98d4f82-04e6-4c74-891e-f83e7165f0de) from ec2/DeleteSubnet - will retry after delay of 6.52s
I0216 12:00:38.744053   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 3cbcb968-456c-4e3a-ba3c-c9c3d3b084ff) from ec2/DeleteVolume - will retry after delay of 7.392s
I0216 12:00:38.758525   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 2487a0a7-2e31-4841-ae70-78db8c43b21b) from ec2/DeleteSecurityGroup - will retry after delay of 6.112s
I0216 12:00:38.777200   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: ae7ebf42-fbfe-41ec-b7b5-d00e02c01b81) from ec2/DeleteSecurityGroup - will retry after delay of 4.064s
I0216 12:00:39.027556   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: c440fc41-dff3-4af4-9258-8d95e248a20a) from ec2/ReleaseAddress - will retry after delay of 7.944s
I0216 12:00:39.061817   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 04d49bb8-f691-4c00-a8c9-11464f5355b8) from ec2/ReleaseAddress - will retry after delay of 5.744s
I0216 12:00:39.183454   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 3915504c-0b89-44ef-aef8-f2dd5268dbf0) from ec2/DeleteVolume - will retry after delay of 6.216s
I0216 12:00:39.208595   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: b329f48c-2140-4d2b-bbb8-50ac5431b09d) from ec2/DeleteSecurityGroup - will retry after delay of 7.264s
I0216 12:00:39.224426   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 16324629-c166-475b-8de7-d1ca5583e24d) from ec2/DeleteVolume - will retry after delay of 5.36s
I0216 12:00:39.245337   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: eabec3cf-32bb-43af-9264-d8a25e019939) from ec2/DeleteVolume - will retry after delay of 6.904s
I0216 12:00:39.431604   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 462353db-052f-4a90-8566-c241c4d1b903) from ec2/DeleteVolume - will retry after delay of 5.368s
I0216 12:00:39.523822   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 3dc5130e-4661-4769-bda8-0e9ccfd2491e) from ec2/DeleteSubnet - will retry after delay of 4.04s
I0216 12:00:39.683042   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 1d11d8f6-3313-4740-8764-0ecc4cfe9057) from ec2/DeleteVolume - will retry after delay of 7.128s
I0216 12:00:39.823968   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: dd1196ad-f9c0-4b67-b4f5-1043591dc5ea) from ec2/DeleteSubnet - will retry after delay of 5.04s
I0216 12:00:39.836455   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 2e7e868c-f3ca-4115-b51a-33174061c85d) from ec2/ReleaseAddress - will retry after delay of 4.232s
I0216 12:00:40.073099   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 277036c4-391b-4264-95be-c9a3890f4592) from ec2/DeleteSecurityGroup - will retry after delay of 6.088s
I0216 12:00:40.386382   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 299e20d4-9b71-484c-819c-f249ea2e514f) from ec2/DeleteVolume - will retry after delay of 6.032s
I0216 12:00:40.388647   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 2e3679ce-eefa-4faa-ac79-66bb344184ae) from ec2/DeleteSecurityGroup - will retry after delay of 7.288s
I0216 12:00:40.568829   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 0e17a1ed-0e35-4a45-b517-9f4ff888b5e1) from ec2/DeleteSecurityGroup - will retry after delay of 6.4s
security-group:sg-4b611533  ok
volume:vol-051f2f335d9841db9    still has dependencies, will retry
subnet:subnet-ab1e28f3  ok
volume:vol-069114779a254da90    ok
subnet:subnet-f878b2b1  still has dependencies, will retry
I0216 12:00:44.145238   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 9c3660dd-2353-4f03-bc61-d59360d9e92f) from ec2/ReleaseAddress - will retry after delay of 14.592s
I0216 12:00:44.664229   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: f72a2627-649b-4dfb-abb7-4af98a7fb8b3) from ec2/DeleteVolume - will retry after delay of 10.336s
I0216 12:00:44.965389   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: b311dd30-54b2-437f-8b26-105a703edab8) from ec2/DeleteSubnet - will retry after delay of 12.864s
I0216 12:00:45.080082   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 14fb2023-b1a6-4732-bef5-0bfa2a702b1f) from ec2/DeleteVolume - will retry after delay of 9.808s
I0216 12:00:45.097391   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 53d241af-a4aa-42b1-840a-064beee8126d) from ec2/ReleaseAddress - will retry after delay of 14.864s
I0216 12:00:45.155219   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 7fc4d788-ccb3-488e-a86d-89b6b9a75837) from ec2/DeleteSecurityGroup - will retry after delay of 9.648s
I0216 12:00:45.256597   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 0251cff4-1367-42c7-9f3d-d59681d0f5d9) from ec2/DeleteSubnet - will retry after delay of 11.792s
I0216 12:00:45.472431   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 42eba90a-8767-4983-ad5e-8a72b9fc1244) from ec2/DeleteSubnet - will retry after delay of 15.504s
I0216 12:00:45.696224   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 1f852c3c-b632-4e40-8797-9d78e0f0f7bb) from ec2/DeleteVolume - will retry after delay of 15.888s
I0216 12:00:45.838773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 34808969-1e52-4ea1-ad6f-b105e0896b30) from ec2/DetachInternetGateway - will retry after delay of 14.224s
I0216 12:00:46.418787   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: f4c0409c-96c5-4881-9949-01f1a4b448cc) from ec2/DeleteVolume - will retry after delay of 12.896s
I0216 12:00:46.426612   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 8f8aa57b-1a44-4459-aad8-c4650f84544d) from ec2/DeleteVolume - will retry after delay of 11.12s
I0216 12:00:46.432869   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: f4b002a2-c6fe-4fdc-b3fb-c2c663eaebed) from ec2/DeleteSecurityGroup - will retry after delay of 8.992s
I0216 12:00:46.697706   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: af0b59b6-fdf9-41d3-ac24-0219b1925a83) from ec2/DeleteVolume - will retry after delay of 11.936s
I0216 12:00:46.749192   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: c9a79407-c117-44ea-9644-ed17524fb7c0) from ec2/DeleteSecurityGroup - will retry after delay of 9.92s
I0216 12:00:47.093747   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: 7dc61333-e1ef-4489-b355-2c0794372a1c) from ec2/DeleteVolume - will retry after delay of 8.608s
I0216 12:00:47.253045   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: ef7a4f80-f594-4e3b-874b-1c75826be7c3) from ec2/DeleteSecurityGroup - will retry after delay of 11.12s
I0216 12:00:47.254923   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
    status code: 503, request id: d346b40c-6eb5-425b-944a-eca4d5989c5a) from ec2/ReleaseAddress - will retry after delay of 11.904s

The cluster delete suceeded, but to a user this may seem odd.

The interesting thing is that I am getting the errors only on the first delete. If I delete another cluster right after I do not get the errors. Maybe an oddness with the API.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Was this issue fixed?

@amadev I'm running under v1.17.0 and I sitll see A LOT of Got RequestLimitExceeded error on AWS request during my updates. 3 minutes or more, for me isn't fixed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RXminuS picture RXminuS  路  5Comments

argusua picture argusua  路  5Comments

yetanotherchris picture yetanotherchris  路  3Comments

justinsb picture justinsb  路  4Comments

chrislovecnm picture chrislovecnm  路  3Comments