Aws-sdk-java: Recover from "IllegalStateException: Connection pool shut down"

Created on 25 May 2020 · 4Comments · Source: aws/aws-sdk-java

I think I'm currently running into the same (or at least related) issue to #1282.

We use the same AmazonS3 instance for a long time (essentially the full runtime of the service which measures in days to weeks) by providing it as a Bean in our Spring Boot application. Since the client is documented as thread safe and no documentation states we assumed that is safe.

The application processes images and pdfs and can sometimes run into an OOM situation. This is generally dealt with and the application does recover from it. However, it seems that this triggers a non-recoverable condition in the S3 client. #1282 pointed me to apache/httpcomponents-client@ca98ad6 and I think this is the root cause: Our application itself might recover but the connection pool is shut down. Now, I can't really argue with the behaviour of httpclient since their argument is solid for doing what they do but I still need a way out of this.

What would be the "proper" way to handle this from the standpoint of aws-sdk-java? I have some ideas but none seem really good:

Don't reuse the AmazonS3 instance but re-create it for each request. I have seen code like this floating aroung but I'm not sure on the costs of re-creating a client for each request. And I'm not even sure that this would fix it since it looks like the underlying connection pool is still shared.
Call AmazonS3.shutdown when I run into the IllegalStateException. After looking at the code that seems to propagate down to the connection pool and that it will close it. But then what? Will a new pool be automatically created? And I would need to do that on every call site which would make the code rather ugly.
Limit the lifetime of the AmazonS3 instance in my application and periodically create a new one, manually calling shutdown on the old on. But that would them effect also ongoing connections. And it wouldn't really solve the problem for the time after the pool is shutdown and a new one is created but at least the application will recover eventually.
Just give up, not handle the OOM condition and lets kubernetes restart my pod.

closed-for-staleness guidance response-requested

Source

martinth

Most helpful comment

Hello, I'm having the same issue and frankly having to restart the pod just for this reason seems a very bad solution, limiting the instance lifespan and calling shutdown if frankly another bad solution as it will stop ongoing connection.

Is it really something that cannot be avoided? In my case I'm not seeing any OOM error in my logs

Can we please reopen this issue?

demetrio812 on 21 Sep 2020

👍4

All 4 comments

Hi @martinth, thank you for bringing this to our attention.

Creating one client per request is strongly recommended against. The best option would be letting kubernetes restart the whole application, as there's not a really safe way to recover from the connection pool shutting down. But we agree this is not a great user experience.

Option n.3 is a good one too, and you can add auto-scalling features and application-level QoS filters to avoid ever encountering OOM conditions.

Let us know if this helps.

debora-ito on 30 May 2020

@martinth Surpassing any Java Error is not a recommended practice as it makes your application nondeterministic. All errors are non-recoverable problems.

meshuga on 30 May 2020

It looks like this issue hasn’t been active in longer than a week. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.