How to reproduce:
Create launchconfigurations until you reach the limit for that aws account and try to upgrade a kubernetes cluster with kops, it will keep is this loop:
...
I1204 16:03:16.701326 45183 aws_cloud.go:570] Resolved image "ami-4bb3e05c"
I1204 16:03:16.702746 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:16.941540 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.092591 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.755397 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.863162 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.537845 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.690481 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.313920 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.432232 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 705ms
I1204 16:03:20.034683 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 604ms
I1204 16:03:20.139128 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.640814 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.972542 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.427645 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.758880 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.212730 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.480970 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 655ms
I1204 16:03:23.046056 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.139135 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.839591 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.920233 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.627519 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.695294 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.395123 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.401989 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 732ms
I1204 16:03:26.115919 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 889ms
I1204 16:03:26.138282 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:26.922574 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.010358 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.721469 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.802537 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.516712 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.612808 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.271350 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.441639 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.075490 45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.212078 45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 709ms
...
I don't think this is a retryable error though
Regards
I am not sure if we actually have the notion of a non-recoverable error (or trust AWS that when it says something is permanent that it genuinely is). But we can have a look, or at least provide a hint if we encounter the error :-)
It would be very handy if we could use IAM profiling, and a dry run to verify if AWS will even allow us to create a cluster.
We could check things like limits on resources, permissions, etc
I think root cause here is #329
@justinsb I would sorta agree. We are not cleaning up, so yes that is a problem. But the call to great a new launch config, when your account cannot create another, loops. I have seen kops hang with quota issues. We can run into this if you have a bunch of clusters and hit your quota for launch configs.
Wondering how much of this is related to https://github.com/kubernetes/kops/issues/1051
Can we test and close if so?
Sort of. We still will loop to eternity when we hit certain limits. We are not timing out properly somewhere.
In kops 1.5.0 we have much clearer logging for errors during retries: #1658
I think we should consider the idea of retryable errors, but it isn't clear when errors are retryable. If another cluster is being deleted, resources may become available.
@justinsb since we put #1658 in place I have seen that we seem to be hammering the API pretty hard on deletes. Not sure if this is expected?
For example deleting a private topo cluster with HA masters. I am on master
I0216 12:00:38.022397 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 97663fe8-8cff-421f-b009-5cda8131b7e2) from ec2/DeleteSubnet - will retry after delay of 5.072s
I0216 12:00:38.258773 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 7531b2af-9fce-48a0-9fd4-6d008074029f) from ec2/DeleteSubnet - will retry after delay of 6.432s
I0216 12:00:38.509773 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 0ded66e4-96da-4e4d-bf7d-53f4b2cadb6f) from ec2/DeleteVolume - will retry after delay of 4.664s
I0216 12:00:38.582814 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 489434b5-5ae8-4ac3-8a84-c2158abb3ca3) from ec2/DeleteVolume - will retry after delay of 4.848s
I0216 12:00:38.595522 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: e6a789a9-4ee8-4d63-936c-6bb01ce7d133) from ec2/DetachInternetGateway - will retry after delay of 6.96s
I0216 12:00:38.663642 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: d98d4f82-04e6-4c74-891e-f83e7165f0de) from ec2/DeleteSubnet - will retry after delay of 6.52s
I0216 12:00:38.744053 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 3cbcb968-456c-4e3a-ba3c-c9c3d3b084ff) from ec2/DeleteVolume - will retry after delay of 7.392s
I0216 12:00:38.758525 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 2487a0a7-2e31-4841-ae70-78db8c43b21b) from ec2/DeleteSecurityGroup - will retry after delay of 6.112s
I0216 12:00:38.777200 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: ae7ebf42-fbfe-41ec-b7b5-d00e02c01b81) from ec2/DeleteSecurityGroup - will retry after delay of 4.064s
I0216 12:00:39.027556 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: c440fc41-dff3-4af4-9258-8d95e248a20a) from ec2/ReleaseAddress - will retry after delay of 7.944s
I0216 12:00:39.061817 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 04d49bb8-f691-4c00-a8c9-11464f5355b8) from ec2/ReleaseAddress - will retry after delay of 5.744s
I0216 12:00:39.183454 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 3915504c-0b89-44ef-aef8-f2dd5268dbf0) from ec2/DeleteVolume - will retry after delay of 6.216s
I0216 12:00:39.208595 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: b329f48c-2140-4d2b-bbb8-50ac5431b09d) from ec2/DeleteSecurityGroup - will retry after delay of 7.264s
I0216 12:00:39.224426 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 16324629-c166-475b-8de7-d1ca5583e24d) from ec2/DeleteVolume - will retry after delay of 5.36s
I0216 12:00:39.245337 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: eabec3cf-32bb-43af-9264-d8a25e019939) from ec2/DeleteVolume - will retry after delay of 6.904s
I0216 12:00:39.431604 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 462353db-052f-4a90-8566-c241c4d1b903) from ec2/DeleteVolume - will retry after delay of 5.368s
I0216 12:00:39.523822 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 3dc5130e-4661-4769-bda8-0e9ccfd2491e) from ec2/DeleteSubnet - will retry after delay of 4.04s
I0216 12:00:39.683042 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 1d11d8f6-3313-4740-8764-0ecc4cfe9057) from ec2/DeleteVolume - will retry after delay of 7.128s
I0216 12:00:39.823968 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: dd1196ad-f9c0-4b67-b4f5-1043591dc5ea) from ec2/DeleteSubnet - will retry after delay of 5.04s
I0216 12:00:39.836455 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 2e7e868c-f3ca-4115-b51a-33174061c85d) from ec2/ReleaseAddress - will retry after delay of 4.232s
I0216 12:00:40.073099 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 277036c4-391b-4264-95be-c9a3890f4592) from ec2/DeleteSecurityGroup - will retry after delay of 6.088s
I0216 12:00:40.386382 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 299e20d4-9b71-484c-819c-f249ea2e514f) from ec2/DeleteVolume - will retry after delay of 6.032s
I0216 12:00:40.388647 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 2e3679ce-eefa-4faa-ac79-66bb344184ae) from ec2/DeleteSecurityGroup - will retry after delay of 7.288s
I0216 12:00:40.568829 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 0e17a1ed-0e35-4a45-b517-9f4ff888b5e1) from ec2/DeleteSecurityGroup - will retry after delay of 6.4s
security-group:sg-4b611533 ok
volume:vol-051f2f335d9841db9 still has dependencies, will retry
subnet:subnet-ab1e28f3 ok
volume:vol-069114779a254da90 ok
subnet:subnet-f878b2b1 still has dependencies, will retry
I0216 12:00:44.145238 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 9c3660dd-2353-4f03-bc61-d59360d9e92f) from ec2/ReleaseAddress - will retry after delay of 14.592s
I0216 12:00:44.664229 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: f72a2627-649b-4dfb-abb7-4af98a7fb8b3) from ec2/DeleteVolume - will retry after delay of 10.336s
I0216 12:00:44.965389 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: b311dd30-54b2-437f-8b26-105a703edab8) from ec2/DeleteSubnet - will retry after delay of 12.864s
I0216 12:00:45.080082 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 14fb2023-b1a6-4732-bef5-0bfa2a702b1f) from ec2/DeleteVolume - will retry after delay of 9.808s
I0216 12:00:45.097391 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 53d241af-a4aa-42b1-840a-064beee8126d) from ec2/ReleaseAddress - will retry after delay of 14.864s
I0216 12:00:45.155219 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 7fc4d788-ccb3-488e-a86d-89b6b9a75837) from ec2/DeleteSecurityGroup - will retry after delay of 9.648s
I0216 12:00:45.256597 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 0251cff4-1367-42c7-9f3d-d59681d0f5d9) from ec2/DeleteSubnet - will retry after delay of 11.792s
I0216 12:00:45.472431 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 42eba90a-8767-4983-ad5e-8a72b9fc1244) from ec2/DeleteSubnet - will retry after delay of 15.504s
I0216 12:00:45.696224 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 1f852c3c-b632-4e40-8797-9d78e0f0f7bb) from ec2/DeleteVolume - will retry after delay of 15.888s
I0216 12:00:45.838773 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 34808969-1e52-4ea1-ad6f-b105e0896b30) from ec2/DetachInternetGateway - will retry after delay of 14.224s
I0216 12:00:46.418787 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: f4c0409c-96c5-4881-9949-01f1a4b448cc) from ec2/DeleteVolume - will retry after delay of 12.896s
I0216 12:00:46.426612 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 8f8aa57b-1a44-4459-aad8-c4650f84544d) from ec2/DeleteVolume - will retry after delay of 11.12s
I0216 12:00:46.432869 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: f4b002a2-c6fe-4fdc-b3fb-c2c663eaebed) from ec2/DeleteSecurityGroup - will retry after delay of 8.992s
I0216 12:00:46.697706 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: af0b59b6-fdf9-41d3-ac24-0219b1925a83) from ec2/DeleteVolume - will retry after delay of 11.936s
I0216 12:00:46.749192 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: c9a79407-c117-44ea-9644-ed17524fb7c0) from ec2/DeleteSecurityGroup - will retry after delay of 9.92s
I0216 12:00:47.093747 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 7dc61333-e1ef-4489-b355-2c0794372a1c) from ec2/DeleteVolume - will retry after delay of 8.608s
I0216 12:00:47.253045 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: ef7a4f80-f594-4e3b-874b-1c75826be7c3) from ec2/DeleteSecurityGroup - will retry after delay of 11.12s
I0216 12:00:47.254923 36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: d346b40c-6eb5-425b-944a-eca4d5989c5a) from ec2/ReleaseAddress - will retry after delay of 11.904s
The cluster delete suceeded, but to a user this may seem odd.
The interesting thing is that I am getting the errors only on the first delete. If I delete another cluster right after I do not get the errors. Maybe an oddness with the API.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Was this issue fixed?
@amadev I'm running under v1.17.0 and I sitll see A LOT of Got RequestLimitExceeded error on AWS request during my updates. 3 minutes or more, for me isn't fixed.
Most helpful comment
@amadev I'm running under v1.17.0 and I sitll see A LOT of
Got RequestLimitExceeded error on AWS requestduring my updates. 3 minutes or more, for me isn't fixed.