Stackexchange.redis: RedisTimeoutException exception message question

Created on 12 Oct 2018 · 41Comments · Source: StackExchange/StackExchange.Redis

Hi. Using version 2.0. 495. I've seen this page and have found some great information. I have a question about some of the information in the timeout exception message. As an example:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536, serverEndpoint: *, mgr: 10 of 10 available, clientName: *, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=68,Free=32699,Min=2,Max=32767)

I understand that Busy > Min = waiting 500ms for a new thread to get added to the pool. What I am wondering is why mgr: 10 of 10 available - shouldn't the mgr count read 0 of 10 available since we are in the global thread pool? In pretty much all of our timeout errors there are at least 9 of 10 available (and we have A LOT of timeouts). I've seen this with both worker and IOCP threads being > than the min.

Source

CDargis

👍7

Most helpful comment

We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).

I've noticed the errors come in three flavors. Here are some representative samples:

RedisTimeoutException:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (5484ms elapsed, 
timeout is 5000ms), inst: 0, qs: 138, in: 0, serverEndpoint: 10.72.144.135:6379, mgr: 10 of 10
available, clientName: B4D107E329BC, PerfCounterHelperkeyHashSlot: 3602, IOCP:
(Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=13,Free=32754,Min=150,Max=32767),
v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can
cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

RedisConnectionException: No connection is available to service this operation:

StackExchange.Redis.RedisConnectionException: No connection is available to service this
operation: HGETALL SR:reporter:500:d747d528-2b6b-465f-aa94-b9af4c1b705e; IOCP: 
(Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=11,Free=32756,Min=200,
Max=32767), Local-CPU: n/a

RedisConnectionException: SocketFailure:

StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/TimedOut, 
last-recv: 5) on 10.72.144.135:6379/Interactive, Idle, last: INCR, origin: ReadFromPipe, 
outstanding: 323, last-read: 134942s ago, last-write: 0s ago, unanswered-write: 134942s 
ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 
0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.513.63329 --->
System.Net.Sockets.SocketException: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established connection failed 
because connected host has failed to respond

We have hundreds of containers in production all talking to the same ElastiCache cluster, but these connection errors only occur on one host at a time. I.e., it's not a general Redis outage, it's something localized to a specific container.

hpk on 1 Nov 2018

👍4

All 41 comments

Without more context it is hard to know what is happening. On the surface,
the custom pool seems to think it is idle. Whether that is right or not is
complex - I should ensure that is visible in the log. The high "in" is
concerning, though. I hope we haven't found a new way of stealing reader
threads!

On Fri, 12 Oct 2018, 22:53 Chris Dargis, notifications@github.com wrote:

Hi. Using running version 2.0. 495. I've seen this
https://github.com/StackExchange/StackExchange.Redis/blob/master/docs/Timeouts.md
page and have found some great information. I have a question about some of
the information in the timeout exception message. As an example:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response
(15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536,
serverEndpoint: *, mgr: 10 of 10 available, clientName: *, IOCP:
(Busy=1,Free=999,Min=2,Max=1000), WORKER:
(Busy=68,Free=32699,Min=2,Max=32767)

I understand that Busy > Min = waiting 500ms for a new thread to get added
to the pool. What I am wondering is why mgr: 10 of 10 available -
shouldn't the mgr count read 0 of 10 available since we are in the global
thread pool? In pretty much all of our timeout errors there are at least 9
of 10 available. I've seen this with both worker and IOCP threads being >
than the min.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/StackExchange/StackExchange.Redis/issues/973, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABDsMgq5ldZfvP8ntaCn5sAmtcg7g__ks5ukQ9YgaJpZM4XaPuR
.

mgravell on 13 Oct 2018

Thanks for the prompt response.

For context - we recently upgraded from 1.2 and have seen 10x more timeouts. We use redis to cache our NoSQL objects (which we can pull from Dynamo under the same load with no issues). The objects cached are around 200-300k. Some can be a little bigger. We now realize the recommendation is to have objects at 100k and we'll be making those changes. We use hashed objects for most of the objects that are larger and shoot for bucket sizes of 1000. I can say we experience these timeouts when bursts of parallel async work comes in (Task.WhenAll()). Previously we did not need to set min thread numbers but now we're setting them as high as 1000 to resolve these timeouts. With the min thread count this high we can get through the timeouts but everything is much more performant going straight to Dynamo ( we don't need to touch the min thread count for Dynamo)

EDIT: It is possible there are other contributing factors to our recent surge in timeouts such as change and size of our data.

Is there any more context I can give you?

Some follow-up questions:
1) does the above give you any more insight?
2) would you mind pointing me at the relevant code for why the custom pool would report idle?
3) the in number seems like a default buffer size which tells me 0 bytes read maybe? What are reader threads?

CDargis on 13 Oct 2018

Another error that may help provide insight here:

StackExchange.Redis.RedisConnectionException: SocketFailure on master.pp-app-cache.6koqgr.use1.cache.amazonaws.com:6379/Interactive, Idle, last: HGET, origin: ReadFromPipe, outstanding: 144, last-read: 930s ago, last-write: 29s ago, unanswered-write: 930s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 0 of 10 available, in: 0, last-heartbeat: 20s ago, last-mbeat: 4s ago, global: 0s ago, v: 2.0.513.63329 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.BeginReceive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
at System.Net.Sockets.NetworkStream.BeginRead(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)

This is the first I've seen the 0 of 10 available in our errors, but I think the most interesting here is:

last-read: 930s ago, last-write: 29s ago,

930 seconds is 15 minutes. No idea why the host would be shutting down the connection.

CDargis on 19 Oct 2018

I'm having similar timeout issues with the > 2.0 libraries compared to 1.2 version. The issues seems to come from what looks like higher Server load/server side retries (it starts to fall apart for me around 2k connections, and 20% Server load from an Azure Redis C6 instance). The general errors I see are similar to the below example
Timeout awaiting response (15750ms elapsed, timeout is 15000ms), inst: 0, qs: 40162, in: 0, serverEndpoint: Unspecified/...:6380, mgr: 7 of 10 available, clientName: ..., IOCP: (Busy=1,Free=999,Min=512,Max=1000), WORKER: (Busy=52,Free=32715,Min=1024,Max=32767),

Based on this: qs numbers seem really high-- my understanding based on the documentation, is that these are outstanding requests that have not been processed. I'm trying to figure out how these numbers are so high, as based on my code, I don't think it allows creating more than 5k requests. Is there a way to clear out outstanding requests, or flush a connection under timeouts? I've looked through the source code, but it's not clear to me how the library clears out requests that have been "abandoned" by the client.

@mgravell : Is there a recommended setting for high loads for the 2.0 libraries? I'm experimenting with some different settings now, although I think the general consensus I'm coming to is that the retry/settings that worked for 1.2, don't seem to behave well for us with 2.0 libraries.

danmml on 24 Oct 2018

I'm having the same error in a website only when a have peak access (more requests than my average).
Both IIS and Redis Server are large enough and the resources (network traffic, CPU and memory) are just fine.

I already increased the thread quantity in machine.config and even implemented a "connection pool" as described in https://stackexchange.github.io/StackExchange.Redis/Timeouts) but no luck with that.

Here its a sample of the errors:

Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0,    mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 65536,mgr: 0 of 10 available,  IOCP: (Busy=1,Free=2399,Min=1200,Max=2400), WORKER: (Busy=19,Free=2381,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0,    mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 8080, mgr: 9 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=21,Free=2379,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0,    mgr: 6 of 10 available,  IOCP: (Busy=4,Free=2396,Min=1200,Max=2400), WORKER: (Busy=34,Free=2366,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing HMSET (5000ms), inst: 0, qs: 1, in: 0,    mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=7,Free=2393,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0,    mgr: 9 of 10 available,  IOCP: (Busy=0,Free=3200,Min=1600,Max=3200), WORKER: (Busy=11,Free=3189,Min=1600,Max=3200), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0,    mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=11,Free=2389,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing HMSET (5000ms), inst: 0, qs: 2, in: 0,    mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=12,Free=2388,Min=1200,Max=2400), v: 2.0.513.63329

I noticed there the "in" amount in one of the log messages (in: 65536) its exactly the same as @CDargis (!?). Thats mean something? Any ideias @mgravell ?

vanderleipires on 29 Oct 2018

@vanderleipires

I've also implemented connection pooling that doesn't seem to help much. We have a typical cache-aside pattern with redis/dynamo.

What I noticed is that we experience the timeouts when large amounts of async work comes in (await Task.WhenAll()) and we were passing lots of tasks here sometimes. And these tasks spawn lots of other tasks that hit the cache and DB. Instead of passing all of the tasks at once I implemented parallel async extension to take in account the degree of parallelism. So instead of launching all of the tasks all at once with Task.WhenAll() we set the degree of parallelism to Environment.ProcessorCount. This definitely goes slower but it alleviates a lot of the timeouts. We've also stopped caching in a lot of spots. For the time being this is the solution we've gone with.

CDargis on 30 Oct 2018

👍1

We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).

I've noticed the errors come in three flavors. Here are some representative samples:

RedisTimeoutException:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (5484ms elapsed, 
timeout is 5000ms), inst: 0, qs: 138, in: 0, serverEndpoint: 10.72.144.135:6379, mgr: 10 of 10
available, clientName: B4D107E329BC, PerfCounterHelperkeyHashSlot: 3602, IOCP:
(Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=13,Free=32754,Min=150,Max=32767),
v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can
cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

RedisConnectionException: No connection is available to service this operation:

StackExchange.Redis.RedisConnectionException: No connection is available to service this
operation: HGETALL SR:reporter:500:d747d528-2b6b-465f-aa94-b9af4c1b705e; IOCP: 
(Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=11,Free=32756,Min=200,
Max=32767), Local-CPU: n/a

RedisConnectionException: SocketFailure:

StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/TimedOut, 
last-recv: 5) on 10.72.144.135:6379/Interactive, Idle, last: INCR, origin: ReadFromPipe, 
outstanding: 323, last-read: 134942s ago, last-write: 0s ago, unanswered-write: 134942s 
ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 
0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.513.63329 --->
System.Net.Sockets.SocketException: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established connection failed 
because connected host has failed to respond

hpk on 1 Nov 2018

👍4

Any update on this? There seems to be a pretty noticeable increase in connection issues reported for v2.x.

To expand on my previous comment, the primary issue isn't that we see a handful of connection errors/timeouts here and there. It's that we'll often see a single container suddenly just start spewing hundreds of these connection errors. SE.Redis fails to recover and the only resolution is to kill the container.

hpk on 28 Nov 2018

We have the same scenario as @hpk. Is there any way to fix this ?

Bogatinovski on 7 Jan 2019

got the same issue here - cache was working fine with 1.2.6, continually falling over with 2+

ispysoftware on 1 Feb 2019

Hello, same kind of issue in 2.0, but also I have this issue only since I'm using elasticache with redis 4.0.
Timeouts are happening if I delete a large number of keys, and I receive a lot a key space notifications.
But in debug on my computer I have no issue using this same database, only in prod (container in ECS).

Timeout performing SREM (30000ms), next: EXISTS cache:thing:stuff:1437bc51-401d-4f35-b0a7-107bd50fdb60:en-US, inst: 1, qu: 0, qs: 91, aw: False, rs: ReadAsync, ws: Idle, in: 848, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/**************:6379, mgr: 10 of 10 available, clientName: 3e5d13637a65, IOCP: (Busy=0,Free=1000,Min=36,Max=1000), WORKER: (Busy=90,Free=32677,Min=36,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I tried to add way more vCPU, but the issue is the same.

bastiflew on 30 Apr 2019

Hi, I'm experiencing the same issue in a .net core 2.1 project. Any idea on this issue? I'm not see a lot, average around 5 to 10 per day.

Remove cache failed Timeout performing UNLINK organisation-******** (5000ms), inst: 1, qs: 8, in: 352, serverEndpoint: Unspecified/******:6379, mgr: 10 of 10 available, clientName: 580a039d22ca, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=7,Free=32760,Min=2,Max=32767), v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) : at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\ConnectionMultiplexer.cs:line 2197 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisBase.cs:line 54 at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisDatabase.cs:line 590

satuday on 29 Aug 2019

We are getting
Connection id "0HLPHO068DD3J", Request id "0HLPHO068DD3J:00000001": An unhandled exception was thrown by the application.
StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL, inst: 7, queue: 9, qu: 0, qs: 9, qc: 0, wr: 0, wq: 0, in: 58988, ar: 0, clientName: , serverEndpoint: Unspecified/*:6379, keyHashSlot: 14678 .

AWS Redis 5
Latest Package for StachExchange from redis.

Alik2015 on 5 Sep 2019

Hello All,
StackExchange.Redis.StrongName version : 1.2.1
Microsoft.Web.RedisSessionStateProvider version : 2.2.6.344

All of a sudden our production server (Azure) throwing exception messages:

Exception Category 1 "Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 17, qu: 0, qs: 17, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName......"

image

Exception Category 2 "No connection is available to service this operation: EVAL; A connection attempt failed because the connected party did not properly respond after a period of time..."

Is this something related to Threadpool Throttling ??

RajaInzamam on 21 Nov 2019

👍1

Same problem here:

Exception occurred: Timeout performing HGET (5000ms), next: HGET ....com, inst: 3, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/....redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD..., IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=8183,Min=1,Max=8191), v: 2.0.601.3402

bartcoppens on 20 Dec 2019

Getting random errors -
{"code":500,"message":"Timeout awaiting response (6625ms elapsed, timeout is 5000ms), inst: 0, qs: 5, in: 0, serverEndpoint: Unspecified/ .. :6380, mgr: 8 of 10 available, clientName: RD0003FF00D3EC, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=4,Free=678,Min=2,Max=682), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)","status":500}

hiral-cloud-evangelist on 28 Jan 2020

Hi all; I'm sorry for delays getting back onto here - lots of plates spinning, and I simply haven't been available. I'm ramping up some investigations into timeouts, so: if this is still impacting you, I'm genuinely here again!

Some things that I'm keenly interested in for all of these things, because they radically change the moving pieces:

what framework exactly? .NET Framework 4.7.2? .NET Core App 3.0.1? etc
is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")
what is the main "rs" value in the error log?
was the server actually jammed at the time?

(For example on the "rs" one; yesterday I was talking to someone who was seeing rs:CompletePendingMessage, which sounds like there may be a TPL issue re thread-stealing; I'm investigating that possibility currently)

mgravell on 28 Jan 2020

hi @mgravell , nice to see that you're back.

Exception thrown: 'StackExchange.Redis.RedisTimeoutException' in System.Private.CoreLib.dll: 'Timeout awaiting response (5615ms elapsed, timeout is 5000ms), inst: 0, qs: 90901, in: 0, serverEndpoint: Unspecified/localhost:6379, mgr: 9 of 10 available, clientName: Georgis-MBP, IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)'

I'm on MacOS, .Net Core 3.1.101
This error happens on my local machine.

blinker001 on 28 Jan 2020

Timeout performing SET (3000ms), next: PTTL+GET us:staging:campaignInfoCache, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 8250, serverEndpoint: Unspecified/ihubapp.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: us-ihubapp-codeblack-west, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=2,Free=32765,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

For me the rs value seems to always be ReadAsync.
On windows server, .net 4.7.2 on Azure Redis, using TLS
Redis load is very low

dplarina on 29 Jan 2020

OS: Windows 2012 R2
Framework: .NET 4.7.2
Endpoint: AWS ElastiCache Redis without TLS

rs seems to be ReadAsync most of the time
Several servers are hanging during this period, and then they all free up.

Here are a few of the different logged errors:

RedisTimeoutException: Timeout performing SETEX (1000ms), inst: 7, qu: 0, qs: 78, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 299, IOCP: (Busy=1,Free=999,Min=16,Max=1000), WORKER: (Busy=100,Free=32667,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout performing GET (1000ms), inst: 2, qu: 0, qs: 32, aw: False, rs: ReadAsync, ws: Idle, in: 2328, in-pipe: 100, out-pipe: 10781, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16330, IOCP: (Busy=18,Free=982,Min=16,Max=1000), WORKER: (Busy=183,Free=32584,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1031ms elapsed, timeout is 1000ms), command=SET, inst: 0, qu: 0, qs: 44, aw: True, rs: ReadAsync, ws: Writing, in: 0, in-pipe: 0, out-pipe: 31046, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16138, IOCP: (Busy=20,Free=980,Min=16,Max=1000), WORKER: (Busy=209,Free=32558,Min=100,Max=32767), v: 2.0.600.65315

jasoncono on 29 Jan 2020

@jasoncono the first one looks like a genuine server (or network) stall (nothing inbound or outbound); the other two... could be just about anything, annoyingly; yet more data points, at least...

mgravell on 29 Jan 2020

👍1

To follow up on my earlier comment, below are some more error examples. We're connecting to a Redis cluster that never sees higher than 15% CPU usage. No TLS is involved, and there's nothing in the Redis slowlog to indicate that the server is getting overwhelmed.

We have a couple environments, and see these errors in both:

OS: Windows Server 2019 in a Docker image on AWS ECS
Framework: .NET Framework 4.7.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (765ms elapsed, timeout is 500ms), inst: 0, qs: 64, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 6 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 10610, IOCP: (Busy=1,Free=999,Min=150,Max=1000), WORKER: (Busy=5,Free=32762,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (969ms elapsed, timeout is 500ms), inst: 0, qs: 66, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 7195, IOCP: (Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=9,Free=32758,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

OS: Windows & Linux, running in Docker images on AWS EKS
Framework: .NET Core 2.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (952ms elapsed, timeout is 50ms), inst: 0, qs: 1, in: 6, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: <--redacted-->, PerfCounterHelperkeyHashSlot: 943, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=26,Free=32741,Min=2,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The nodes indicated by serverEndpoint vary, so it's not just a single cluster node.

Happy to provide any more information you need; just let me know!

hpk on 29 Jan 2020

Endpoint: AWS ElastiCache Redis

what framework _exactly_? .NET Framework 4.7.2? .NET Core App 3.0.1? etc

.NET framework 4.7.2

is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")

Yes. AWS configuration of Redis says "Encryption in-transit: Yes"

what is the main "rs" value in the error log?

I do not see this in the logs

was the server actually jammed at the time?

The metrics on the server seemed normal.

CDargis on 31 Jan 2020

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting

<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />

or, something less than 4.5 in <httpRuntime> (also web.config):

<httpRuntime targetFramework="4.x" />

.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:

<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>

To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be _hugely_ helpful to know. @mgravell is working on a workaround for this scenario.

NickCraver on 1 Feb 2020

Hey @NickCraver, @mgravell

We have

aspnet:UseTaskFriendlySynchronizationContext set to true

I can't see that we have httpruntime set.

We were having issues with version 1.2.6 on production. Occasionally the queue on one of our servers would stop processing and the queue number kept incremeanting. We put a fudge in, so that if we get x amount of these consistently we recreate the multiplexer. Looking around your project we saw you had rewritten this, so did an upgrade to the latest version. That upgrade is currently failing in QA as we get similar issues under load.

"StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), inst: 0, qu: 1187, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/removed, mgr: 10 of 10 available, clientName: Removed IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=5,Free=32762,Min=600,Max=32767), v: 2.0.601.3402"

We too are using AWS Elasticache. I am not sure how much time you have dedicated to this project, but I could probably recreate this with you in our staging environment, if that would help.

afinzel on 4 Mar 2020

I have similar issue it work fine for 4 months and suddenly i get this error

Timeout performing EVAL (5000ms), next: EVAL, inst: 129, qu: 0, qs: 125, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/----.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF551B49, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=210,Free=32557,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95 at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 122 at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.RedisConnectionWrapper.TryUpdateAndReleaseLock(Object lockId, ISessionStateItemCollection data, Int32 sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 363 at Microsoft.Web.Redis.RedisSessionStateProvider.d__25.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 433 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.SessionStateModuleAsync.d__80.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar) at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.<>c__DisplayClass285_0.b__0() at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step) at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously

hshazli on 12 Mar 2020

@hshazli your workers just hit 210 and your min was 200. Do you think increasing your min workers Would help? I’m not an expert in this but you could try it.

afinzel on 12 Mar 2020

@afinzel but this one error other message in other error message i get the worker hit 300 or above what is the min value i should use and how to select the proper one,
also what about this line serverEndpoint: Unspecified/----.redis.cache.windows.net:6380 is that normal?

hshazli on 12 Mar 2020

@hshazli Yep, that's normal. In your case, it seems to be a _very_ overloaded thread pool and it wasn't able to pull bytes off the queue fast enough during a spike. It's possible network blips resulting in spikes of incoming traffic contributed to that.

@afinzel I believe your issue is different (#1120), but there's a fix in queue in #1374 there. Does yours happen after a network blip of some sort (even AWS failing over)? That's the situation there, the connection's hung in a bad state afterwards.

NickCraver on 16 Mar 2020

@NickCraver, I didn't think so but I see that changed is merged, so I will get the latest version and retest. Thanks.

edit: It looks like it isn't on Nuget yet.

afinzel on 25 Mar 2020

hi, i using redis over a month efficiently but i just got this error yesterday.
there is not seemable crash or failover, but its dark to getting meaningful.

logged:

Timeout performing ZSCAN (5000ms),
next: ZSCAN xxx, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 18, in-pipe: 0, out-pipe: 0,
serverEndpoint: xx.euc1.cache.amazonaws.com:6379,
mgr: 10 of 10 available,
IOCP: (Busy=0,Free=1000,Min=2,Max=1000),
WORKER: (Busy=37,Free=32730,Min=2,Max=32767), v: 2.1.0.1

infos:

Endpoint: AWS ElastiCache Redis

.NET Core App 2.2.107

AWS configuration of Redis says "Encryption in-transit: No"

The metrics on the server seemed normal.

i hope that would helpful, thanks

anilguzel on 27 Apr 2020

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting
<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />
or, something less than 4.5 in <httpRuntime> (also web.config):
<httpRuntime targetFramework="4.x" />
.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:
<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>
To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be _hugely_ helpful to know. @mgravell is working on a workaround for this scenario.

@NickCraver, Just wanted to clarify here:

Is the following fine in my App.config?

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2" />

According to this Microsoft documentation, any .Net Framework version from 4.0-4.8 should use "v4.0"

https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/startup/supportedruntime-element

Your comment, which had
<supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>

seemed to imply that we could have a value greater than v4.0 for "version"

michaelrod64 on 29 Apr 2020

Hi there,

Also getting similar error. This is running on a azure function. Built using .net core 2.2 using version 2.1.39

System.Exception: Error fetching customer tags for devices 'xxxx'. ---> StackExchange.Redis.RedisTimeoutException: Timeout performing EXISTS (5000ms), next: EXISTS xxxxxxx, inst: 1, qu: 0, qs: 16, aw: False, rs: ReadAsync, ws: Idle, in: 850, serverEndpoint: tttttt:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxx, IOCP: (Busy=16,Free=984,Min=6,Max=1000), WORKER: (Busy=0,Free=32767,Min=6,Max=32767), v: 2.1.39.39788 (Please take a look at this article for some common client-side issues that can cause timeouts:

https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImplT in /_/src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.KeyExists(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 668

Any help appreciated.

Cheers,
Vini

vhmoura on 1 Jul 2020

I get below errors where try to to get or set a big data (about 70 MegaByte). any idea? can i increase redis timeout?

I use redis on windows.
StackExchange.Redis version: 2.1.58

what framework exactly? .NETFramework 4.6.1
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1" />

is there TLS involved? no
what is the main "rs" value in the error log? TryParseResult
was the server actually jammed at the time? no

Timeout performing GET (5000ms), next: GET 8a5d01ea-6017-4110-a1f7-42f63e309ad3, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 81920, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 0 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 9b181d9d-98dc-488a-998c-aded191c2e5d, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 61027, in-pipe: 83549, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 3 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 2db678e5-4a12-44be-98fd-173cd64fa0cd, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 147456, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 2 of 10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=2,Free=998,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

SKamalou on 11 Jul 2020

This issue lives on - 2 years later. @CDargis was definitely onto something with: "We only experience this when running many task / with (Task.WhenAll())" . In my case we see this when doing 1,000s of GETs while using Parallel.ForEach. Setting MaxDegreeOfParallelism = Environment.ProcessorCount was a big help - but does not fix everything. Has anyone else found a better solution?

Active issue for 2 years now.

mgot90 on 14 Oct 2020

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

mgot90 on 14 Oct 2020

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

@mgot90
We are currently struggling with this. We have a process using the cache and redis transactions. When scaling the process to handle many concurrent requests we hit the timeouts almost immediately. Can you provide more info on the fix that you discovered? The only fix that we have found is to limit concurrent requests to redis, this defeats the purpose of the multiplexing and we take a huge hit with performance.

jeffras on 3 Nov 2020

Having the same exception without any obvious reason.

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5641ms elapsed, timeout is 5000ms), command=GET, next: GET <MyKey>, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/<myredis>.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0009DFC8C24D, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=3,Free=8188,Min=50,Max=8191), Local-CPU: unavailable, v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I set both IncludePerformanceCountersInExceptions and IncludeDetailInExceptions to true, but still no CPU information.

arefinsami on 16 Dec 2020

@jeffras I'm just another dev trying to fix the same problem. Here's what we did:

Applied StackExcanges best practices: https://gist.github.com/JonCole/925630df72be1351b21440625ff2671f#file-redis-bestpractices-stackexchange-redis-md
Going IDatabaseAsync seemed to allow a lot more transactions to go through the pipeline
We made our connection a singleton

In the end, we still have this happening every now and then. We're using Azure for our Redis provider. I haven't ruled out that our pricing tier is also related: https://azure.microsoft.com/en-us/pricing/details/cache/

Not sure if it's useful to you, but here is the Azure best practices as well: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices

Did you guys have luck or try anything different?

mgot90 on 16 Dec 2020

@mgot90 Thanks for the info, did you modify the stack exchange code for IDatabaseAsync? I did see it in the source but didn't want to maintain a branch to accomplish this. We still have the problem. We ended up having to wrap our code in a lock and it went away, but our throughput clearly suffers.

jeffras on 17 Dec 2020

We've been dealing with these kind of errors since months (if not even years). Three months ago we've decided to reimplement our caching layer with following design assumptions:

we do not store in redis anything that is larger than 100KB
objects larger than 100KB can either be stored in BlobStorage or logic falls back to primary source. (It is configurable and depends on usecase)

I'm aware that this approach might not be ideal for everyone and has some performance footprint (as you need to check the size) but at least may shed some more light.

Our setup:

Azure Cache for Redis (we've tried multiple pricing tiers but without significant effect)
Azure App Services (net46, netcore11->netcore31) migrated later to AKS (netcore31)
But neither combination worked until we've introduced the changes mentioned at the beginning which I think is kind of recommendation you can infer from docs and also a suggestion we've got from Azure Support at the end of this long journey.

PrzemekGawron on 17 Dec 2020

👀2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Question: Is it recommended to use the Async methods in the StackExchange.Redis API

salzersoftware · 3Comments

How to retrieve multiple values in list if we pass multiple Keys in a list ?

Vijay27anand · 3Comments

Exception of type 'System.OutOfMemoryException' was thrown. at StackExchange.Redis.ConnectionMultiplexer.add_ConnectionRestored

hgmauri · 3Comments

Dependancy conflicts in Microsoft packages between redis and redis.strongname

Herostwist · 4Comments

Server.Keys() call hangs

deepaknc · 5Comments