StackExchange.Redis.RedisConnectionException: No connection is available to service this operation (v 2.1.58)

Created on 16 Jul 2020 · 27Comments · Source: StackExchange/StackExchange.Redis

We are seeing the RedisConnectionException (complete stack trace listed below) on a couple of production servers over the last few weeks. When the error happens, the only option that works is IISReset. After IISReset is performed, things go back to normal. I wanted to check if this issue has been fixed with the latest version 2.1.58.
Specifically, is the Reconnect issue fixed, so IIS Reset is not required?

Your insights are appreciated.

PS: Issue # 1120 talks about the RedisConnectionException for v2.0.601

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: SETEX DSAProductDC:g4_cfodcdaissecret_5.59.448.0; SocketClosed (ReadEndOfStream, last-recv: 0) on redis-10009.rdsud-np.us.dell.com:10009/Subscription, Idle/MarkProcessed, last: PING, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 1s ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402; IOCP: (Busy=0,Free=600,Min=480,Max=600), WORKER: (Busy=21,Free=579,Min=480,Max=600), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: SocketClosed (ReadEndOfStream, last-recv: 0) on redis-10009.rdsud-np.us.dell.com:10009/Subscription, Idle/MarkProcessed, last: PING, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 1s ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402 --- End of inner exception stack trace --- at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImplT at

Source

kulku

👍5 🚀2

Most helpful comment

Greetings,
We're seeing this exact same error,
including Redis Timeouts of more than 100seconds on Azure Redis C1 instance with barely 50-60 connections on average.

Connection Multiplexer seems to be just going away even with abortConnect = false and allowAdmin = true

Also Memory Usage starts to Keep Climbing Above 1-2 GBs on an app service that normally averages around 250-300 MBs

There's a really major issue with the Connectivity Part here.

Looks like it happens if there is a connection drop/blib in Azure / Connectitivity towards redis.

For now tried implementing the recommended Force Reconnect approach.

Our async and sync timeout is set to 30 seconds just in case and we're not storing any objects larger than 1K in redis.

If anyone has any idea what's going on here please share, this is causing major issues in our production atm.

Gevil on 13 Oct 2020

🚀1 👍1

All 27 comments

+100

does anyone know a workaround?

hVostt on 22 Jul 2020

👍1

@kulku I am curious if you are running into this issue while accessing a clustered cache?

deepakverma on 24 Jul 2020

@kulku I am curious if you are running into this issue while accessing a clustered cache?

Yes, that's correct, @deepakverma

kulku on 24 Jul 2020

@kulku then you might be running into this issue https://github.com/StackExchange/StackExchange.Redis/issues/1501

deepakverma on 24 Jul 2020

Thank you, @deepakverma
Looks like it. Hoping for the decision from @mgravell, to take this forward.

kulku on 24 Jul 2020

@mgravell, hope you got a chance to review the suggestions on this thread. Did you make a decision yet?
We are seeing this issue in production again, this time around on multiple boxes. (Typically, it's just a single box)

kulku on 27 Jul 2020

Did a bit more investigation. My observation is that the "StackExchange.Redis.RedisConnectionException: No connection is available to service this operation" exception did not happen in the Nuget version 1.2.3 and Nuget version 1.1.603.
We performed a major upgrade to Nuget version 2.0.601 in June. That's when we started seeing this behavior in production.
I thought I will put it here, if it helps take the resolution move forward.
I am sure, Nuget 2.x was brought in a with a bunch of much sought after enhancements.
I hope this helps a bit.

kulku on 31 Jul 2020

We are having the same issue on our production cluster running on kubernetes and .net core 3.1 with Azure Redis Service and it can run for weeks but suddenly this issue happens. The only way to solve it is to kill the pod. A quick workaround for now is that we catch the exception and forcing the app to restart to "heal" itself. Please @mgravell do you have some advice how this can be fixed in a better way.

andtii on 14 Sep 2020

👍1

We are also having the same issue with the same setup as andtii (.net core 3.1 with Azure Redis Service running in Kubernetes)

ericsuhong on 21 Sep 2020

We were seeing the same issue using Azure Redis Service, but only from specific machines. Consistently. Other machines would work correctly. We couldn't identify any differences between those machines w.r.t. configuration or setup.

danielbecroft on 22 Sep 2020

On Azure App Service we sometimes hit the SNAT port limitations. Not with Redis but e.g. outgoing http or SQL connections. I think, didn't have a chance to fully investigate, that the load balancer in front of the App Service refuses communication when these snat limits are reached.

tweek82 on 30 Sep 2020

We are having the same issue on our production cluster running on kubernetes and .net core 3.1 with Azure Redis Service and it can run for weeks but suddenly this issue happens. The only way to solve it is to kill the pod. A quick workaround for now is that we catch the exception and forcing the app to restart to "heal" itself. Please @mgravell do you have some advice how this can be fixed in a better way.

Did you find a solution to this issue?
We have exactly the same problem on k8s

mjal146 on 5 Oct 2020

@mgravell , our team facing this problems over 4 months, although the solution been provide on July but we still waiting the new version release and inform other 3rd party like Hangfire to update their dependency to support Redis Cluster running correctly on Production. It's a very long waiting for us to get survive on Production oncall hell...

Could you please consider publish the new version and give us a chance from ourselves decision.

blackie1019 on 7 Oct 2020