Hi. Using version 2.0. 495. I've seen this page and have found some great information. I have a question about some of the information in the timeout exception message. As an example:
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536, serverEndpoint: *, mgr: 10 of 10 available, clientName: *, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=68,Free=32699,Min=2,Max=32767)
I understand that Busy > Min = waiting 500ms for a new thread to get added to the pool. What I am wondering is why mgr: 10 of 10 available - shouldn't the mgr count read 0 of 10 available since we are in the global thread pool? In pretty much all of our timeout errors there are at least 9 of 10 available (and we have A LOT of timeouts). I've seen this with both worker and IOCP threads being > than the min.
Without more context it is hard to know what is happening. On the surface,
the custom pool seems to think it is idle. Whether that is right or not is
complex - I should ensure that is visible in the log. The high "in" is
concerning, though. I hope we haven't found a new way of stealing reader
threads!
On Fri, 12 Oct 2018, 22:53 Chris Dargis, notifications@github.com wrote:
Hi. Using running version 2.0. 495. I've seen this
https://github.com/StackExchange/StackExchange.Redis/blob/master/docs/Timeouts.md
page and have found some great information. I have a question about some of
the information in the timeout exception message. As an example:StackExchange.Redis.RedisTimeoutException: Timeout awaiting response
(15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536,
serverEndpoint: *, mgr: 10 of 10 available, clientName: *, IOCP:
(Busy=1,Free=999,Min=2,Max=1000), WORKER:
(Busy=68,Free=32699,Min=2,Max=32767)I understand that Busy > Min = waiting 500ms for a new thread to get added
to the pool. What I am wondering is why mgr: 10 of 10 available -
shouldn't the mgr count read 0 of 10 available since we are in the global
thread pool? In pretty much all of our timeout errors there are at least 9
of 10 available. I've seen this with both worker and IOCP threads being >
than the min.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/StackExchange/StackExchange.Redis/issues/973, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AABDsMgq5ldZfvP8ntaCn5sAmtcg7g__ks5ukQ9YgaJpZM4XaPuR
.
Thanks for the prompt response.
For context - we recently upgraded from 1.2 and have seen 10x more timeouts. We use redis to cache our NoSQL objects (which we can pull from Dynamo under the same load with no issues). The objects cached are around 200-300k. Some can be a little bigger. We now realize the recommendation is to have objects at 100k and we'll be making those changes. We use hashed objects for most of the objects that are larger and shoot for bucket sizes of 1000. I can say we experience these timeouts when bursts of parallel async work comes in (Task.WhenAll()). Previously we did not need to set min thread numbers but now we're setting them as high as 1000 to resolve these timeouts. With the min thread count this high we can get through the timeouts but everything is much more performant going straight to Dynamo ( we don't need to touch the min thread count for Dynamo)
EDIT: It is possible there are other contributing factors to our recent surge in timeouts such as change and size of our data.
Is there any more context I can give you?
Some follow-up questions:
1) does the above give you any more insight?
2) would you mind pointing me at the relevant code for why the custom pool would report idle?
3) the in number seems like a default buffer size which tells me 0 bytes read maybe? What are reader threads?
Another error that may help provide insight here:
StackExchange.Redis.RedisConnectionException: SocketFailure on master.pp-app-cache.6koqgr.use1.cache.amazonaws.com:6379/Interactive, Idle, last: HGET, origin: ReadFromPipe, outstanding: 144, last-read: 930s ago, last-write: 29s ago, unanswered-write: 930s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 0 of 10 available, in: 0, last-heartbeat: 20s ago, last-mbeat: 4s ago, global: 0s ago, v: 2.0.513.63329 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.BeginReceive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
at System.Net.Sockets.NetworkStream.BeginRead(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
This is the first I've seen the 0 of 10 available in our errors, but I think the most interesting here is:
last-read: 930s ago, last-write: 29s ago,
930 seconds is 15 minutes. No idea why the host would be shutting down the connection.
I'm having similar timeout issues with the > 2.0 libraries compared to 1.2 version. The issues seems to come from what looks like higher Server load/server side retries (it starts to fall apart for me around 2k connections, and 20% Server load from an Azure Redis C6 instance). The general errors I see are similar to the below example
Timeout awaiting response (15750ms elapsed, timeout is 15000ms), inst: 0, qs: 40162, in: 0, serverEndpoint: Unspecified/...:6380, mgr: 7 of 10 available, clientName: ..., IOCP: (Busy=1,Free=999,Min=512,Max=1000), WORKER: (Busy=52,Free=32715,Min=1024,Max=32767),
Based on this: qs numbers seem really high-- my understanding based on the documentation, is that these are outstanding requests that have not been processed. I'm trying to figure out how these numbers are so high, as based on my code, I don't think it allows creating more than 5k requests. Is there a way to clear out outstanding requests, or flush a connection under timeouts? I've looked through the source code, but it's not clear to me how the library clears out requests that have been "abandoned" by the client.
@mgravell : Is there a recommended setting for high loads for the 2.0 libraries? I'm experimenting with some different settings now, although I think the general consensus I'm coming to is that the retry/settings that worked for 1.2, don't seem to behave well for us with 2.0 libraries.
I'm having the same error in a website only when a have peak access (more requests than my average).
Both IIS and Redis Server are large enough and the resources (network traffic, CPU and memory) are just fine.
I already increased the thread quantity in machine.config and even implemented a "connection pool" as described in https://stackexchange.github.io/StackExchange.Redis/Timeouts) but no luck with that.
Here its a sample of the errors:
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 0, mgr: 10 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 65536,mgr: 0 of 10 available, IOCP: (Busy=1,Free=2399,Min=1200,Max=2400), WORKER: (Busy=19,Free=2381,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 0, mgr: 10 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 8080, mgr: 9 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=21,Free=2379,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 0, mgr: 6 of 10 available, IOCP: (Busy=4,Free=2396,Min=1200,Max=2400), WORKER: (Busy=34,Free=2366,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing HMSET (5000ms), inst: 0, qs: 1, in: 0, mgr: 10 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=7,Free=2393,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 0, mgr: 9 of 10 available, IOCP: (Busy=0,Free=3200,Min=1600,Max=3200), WORKER: (Busy=11,Free=3189,Min=1600,Max=3200), v: 2.0.513.63329
Timeout performing GET (5000ms), inst: 0, qs: 1, in: 0, mgr: 10 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=11,Free=2389,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing HMSET (5000ms), inst: 0, qs: 2, in: 0, mgr: 10 of 10 available, IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=12,Free=2388,Min=1200,Max=2400), v: 2.0.513.63329
I noticed there the "in" amount in one of the log messages (in: 65536) its exactly the same as @CDargis (!?). Thats mean something? Any ideias @mgravell ?
@vanderleipires
I've also implemented connection pooling that doesn't seem to help much. We have a typical cache-aside pattern with redis/dynamo.
What I noticed is that we experience the timeouts when large amounts of async work comes in (await Task.WhenAll()) and we were passing lots of tasks here sometimes. And these tasks spawn lots of other tasks that hit the cache and DB. Instead of passing all of the tasks at once I implemented parallel async extension to take in account the degree of parallelism. So instead of launching all of the tasks all at once with Task.WhenAll() we set the degree of parallelism to Environment.ProcessorCount. This definitely goes slower but it alleviates a lot of the timeouts. We've also stopped caching in a lot of spots. For the time being this is the solution we've gone with.
We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).
I've noticed the errors come in three flavors. Here are some representative samples:
RedisTimeoutException:StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (5484ms elapsed, timeout is 5000ms), inst: 0, qs: 138, in: 0, serverEndpoint: 10.72.144.135:6379, mgr: 10 of 10 available, clientName: B4D107E329BC, PerfCounterHelperkeyHashSlot: 3602, IOCP: (Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=13,Free=32754,Min=150,Max=32767), v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
RedisConnectionException: No connection is available to service this operation:StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: HGETALL SR:reporter:500:d747d528-2b6b-465f-aa94-b9af4c1b705e; IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=11,Free=32756,Min=200, Max=32767), Local-CPU: n/a
RedisConnectionException: SocketFailure:StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/TimedOut, last-recv: 5) on 10.72.144.135:6379/Interactive, Idle, last: INCR, origin: ReadFromPipe, outstanding: 323, last-read: 134942s ago, last-write: 0s ago, unanswered-write: 134942s ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.513.63329 ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
We have hundreds of containers in production all talking to the same ElastiCache cluster, but these connection errors only occur on one host at a time. I.e., it's not a general Redis outage, it's something localized to a specific container.
Any update on this? There seems to be a pretty noticeable increase in connection issues reported for v2.x.
To expand on my previous comment, the primary issue isn't that we see a handful of connection errors/timeouts here and there. It's that we'll often see a single container suddenly just start spewing hundreds of these connection errors. SE.Redis fails to recover and the only resolution is to kill the container.
We have the same scenario as @hpk. Is there any way to fix this ?
got the same issue here - cache was working fine with 1.2.6, continually falling over with 2+
Hello, same kind of issue in 2.0, but also I have this issue only since I'm using elasticache with redis 4.0.
Timeouts are happening if I delete a large number of keys, and I receive a lot a key space notifications.
But in debug on my computer I have no issue using this same database, only in prod (container in ECS).
Timeout performing SREM (30000ms), next: EXISTS cache:thing:stuff:1437bc51-401d-4f35-b0a7-107bd50fdb60:en-US, inst: 1, qu: 0, qs: 91, aw: False, rs: ReadAsync, ws: Idle, in: 848, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/**************:6379, mgr: 10 of 10 available, clientName: 3e5d13637a65, IOCP: (Busy=0,Free=1000,Min=36,Max=1000), WORKER: (Busy=90,Free=32677,Min=36,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
I tried to add way more vCPU, but the issue is the same.
Hi, I'm experiencing the same issue in a .net core 2.1 project. Any idea on this issue? I'm not see a lot, average around 5 to 10 per day.
Remove cache failed Timeout performing UNLINK organisation-******** (5000ms), inst: 1, qs: 8, in: 352, serverEndpoint: Unspecified/******:6379, mgr: 10 of 10 available, clientName: 580a039d22ca, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=7,Free=32760,Min=2,Max=32767), v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) : at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\ConnectionMultiplexer.cs:line 2197
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisDatabase.cs:line 590
We are getting
Connection id "0HLPHO068DD3J", Request id "0HLPHO068DD3J:00000001": An unhandled exception was thrown by the application.
StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL, inst: 7, queue: 9, qu: 0, qs: 9, qc: 0, wr: 0, wq: 0, in: 58988, ar: 0, clientName: , serverEndpoint: Unspecified/*:6379, keyHashSlot: 14678 .
AWS Redis 5
Latest Package for StachExchange from redis.
Hello All,
StackExchange.Redis.StrongName version : 1.2.1
Microsoft.Web.RedisSessionStateProvider version : 2.2.6.344
All of a sudden our production server (Azure) throwing exception messages:
Exception Category 1 "Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 17, qu: 0, qs: 17, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName......"
image

Exception Category 2 "No connection is available to service this operation: EVAL; A connection attempt failed because the connected party did not properly respond after a period of time..."

Is this something related to Threadpool Throttling ??
Same problem here:
Exception occurred: Timeout performing HGET (5000ms), next: HGET ....com, inst: 3, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/....redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD..., IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=8183,Min=1,Max=8191), v: 2.0.601.3402
Getting random errors -
{"code":500,"message":"Timeout awaiting response (6625ms elapsed, timeout is 5000ms), inst: 0, qs: 5, in: 0, serverEndpoint: Unspecified/ .. :6380, mgr: 8 of 10 available, clientName: RD0003FF00D3EC, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=4,Free=678,Min=2,Max=682), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)","status":500}
Hi all; I'm sorry for delays getting back onto here - lots of plates spinning, and I simply haven't been available. I'm ramping up some investigations into timeouts, so: if this is still impacting you, I'm genuinely here again!
Some things that I'm keenly interested in for all of these things, because they radically change the moving pieces:
(For example on the "rs" one; yesterday I was talking to someone who was seeing rs:CompletePendingMessage, which sounds like there may be a TPL issue re thread-stealing; I'm investigating that possibility currently)
hi @mgravell , nice to see that you're back.
Exception thrown: 'StackExchange.Redis.RedisTimeoutException' in System.Private.CoreLib.dll: 'Timeout awaiting response (5615ms elapsed, timeout is 5000ms), inst: 0, qs: 90901, in: 0, serverEndpoint: Unspecified/localhost:6379, mgr: 9 of 10 available, clientName: Georgis-MBP, IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)'
I'm on MacOS, .Net Core 3.1.101
This error happens on my local machine.
Timeout performing SET (3000ms), next: PTTL+GET us:staging:campaignInfoCache, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 8250, serverEndpoint: Unspecified/ihubapp.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: us-ihubapp-codeblack-west, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=2,Free=32765,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
For me the rs value seems to always be ReadAsync.
On windows server, .net 4.7.2 on Azure Redis, using TLS
Redis load is very low
OS: Windows 2012 R2
Framework: .NET 4.7.2
Endpoint: AWS ElastiCache Redis without TLS
rs seems to be ReadAsync most of the time
Several servers are hanging during this period, and then they all free up.
Here are a few of the different logged errors:
RedisTimeoutException: Timeout performing SETEX (1000ms), inst: 7, qu: 0, qs: 78, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 299, IOCP: (Busy=1,Free=999,Min=16,Max=1000), WORKER: (Busy=100,Free=32667,Min=100,Max=32767), v: 2.0.600.65315
RedisTimeoutException: Timeout performing GET (1000ms), inst: 2, qu: 0, qs: 32, aw: False, rs: ReadAsync, ws: Idle, in: 2328, in-pipe: 100, out-pipe: 10781, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16330, IOCP: (Busy=18,Free=982,Min=16,Max=1000), WORKER: (Busy=183,Free=32584,Min=100,Max=32767), v: 2.0.600.65315
RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1031ms elapsed, timeout is 1000ms), command=SET, inst: 0, qu: 0, qs: 44, aw: True, rs: ReadAsync, ws: Writing, in: 0, in-pipe: 0, out-pipe: 31046, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16138, IOCP: (Busy=20,Free=980,Min=16,Max=1000), WORKER: (Busy=209,Free=32558,Min=100,Max=32767), v: 2.0.600.65315
@jasoncono the first one looks like a genuine server (or network) stall (nothing inbound or outbound); the other two... could be just about anything, annoyingly; yet more data points, at least...
To follow up on my earlier comment, below are some more error examples. We're connecting to a Redis cluster that never sees higher than 15% CPU usage. No TLS is involved, and there's nothing in the Redis slowlog to indicate that the server is getting overwhelmed.
We have a couple environments, and see these errors in both:
OS: Windows Server 2019 in a Docker image on AWS ECS
Framework: .NET Framework 4.7.2
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (765ms elapsed, timeout is 500ms), inst: 0, qs: 64, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 6 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 10610, IOCP: (Busy=1,Free=999,Min=150,Max=1000), WORKER: (Busy=5,Free=32762,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (969ms elapsed, timeout is 500ms), inst: 0, qs: 66, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 7195, IOCP: (Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=9,Free=32758,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
OS: Windows & Linux, running in Docker images on AWS EKS
Framework: .NET Core 2.2
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (952ms elapsed, timeout is 50ms), inst: 0, qs: 1, in: 6, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: <--redacted-->, PerfCounterHelperkeyHashSlot: 943, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=26,Free=32741,Min=2,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
The nodes indicated by serverEndpoint vary, so it's not just a single cluster node.
Happy to provide any more information you need; just let me know!
Endpoint: AWS ElastiCache Redis
- what framework _exactly_? .NET Framework 4.7.2? .NET Core App 3.0.1? etc
.NET framework 4.7.2
- is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")
Yes. AWS configuration of Redis says "Encryption in-transit: Yes"
- what is the main "rs" value in the error log?
I do not see this in the logs
- was the server actually jammed at the time?
The metrics on the server seemed normal.
We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?
In web.config, either explicitly setting
<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />
or, something less than 4.5 in <httpRuntime> (also web.config):
<httpRuntime targetFramework="4.x" />
In app.config, something less than 4.5 in <supportedRuntime>:
<configuration>
<startup>
<supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
</startup>
</configuration>
To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.
If this matches anyone here, it'd be _hugely_ helpful to know. @mgravell is working on a workaround for this scenario.
Hey @NickCraver, @mgravell
We have
aspnet:UseTaskFriendlySynchronizationContext set to true
I can't see that we have httpruntime set.
We were having issues with version 1.2.6 on production. Occasionally the queue on one of our servers would stop processing and the queue number kept incremeanting. We put a fudge in, so that if we get x amount of these consistently we recreate the multiplexer. Looking around your project we saw you had rewritten this, so did an upgrade to the latest version. That upgrade is currently failing in QA as we get similar issues under load.
"StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), inst: 0, qu: 1187, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/removed, mgr: 10 of 10 available, clientName: Removed IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=5,Free=32762,Min=600,Max=32767), v: 2.0.601.3402"
We too are using AWS Elasticache. I am not sure how much time you have dedicated to this project, but I could probably recreate this with you in our staging environment, if that would help.
I have similar issue it work fine for 4 months and suddenly i get this error
Timeout performing EVAL (5000ms), next: EVAL, inst: 129, qu: 0, qs: 125, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/----.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF551B49, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=210,Free=32557,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95 at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 122 at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.RedisConnectionWrapper.TryUpdateAndReleaseLock(Object lockId, ISessionStateItemCollection data, Int32 sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 363 at Microsoft.Web.Redis.RedisSessionStateProvider.d__25.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 433 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.SessionStateModuleAsync.d__80.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar) at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.<>c__DisplayClass285_0.b__0() at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step) at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously
@hshazli your workers just hit 210 and your min was 200. Do you think increasing your min workers Would help? I’m not an expert in this but you could try it.
@afinzel but this one error other message in other error message i get the worker hit 300 or above what is the min value i should use and how to select the proper one,
also what about this line serverEndpoint: Unspecified/----.redis.cache.windows.net:6380 is that normal?
@hshazli Yep, that's normal. In your case, it seems to be a _very_ overloaded thread pool and it wasn't able to pull bytes off the queue fast enough during a spike. It's possible network blips resulting in spikes of incoming traffic contributed to that.
@afinzel I believe your issue is different (#1120), but there's a fix in queue in #1374 there. Does yours happen after a network blip of some sort (even AWS failing over)? That's the situation there, the connection's hung in a bad state afterwards.
@NickCraver, I didn't think so but I see that changed is merged, so I will get the latest version and retest. Thanks.
edit: It looks like it isn't on Nuget yet.
hi, i using redis over a month efficiently but i just got this error yesterday.
there is not seemable crash or failover, but its dark to getting meaningful.
logged:
Timeout performing ZSCAN (5000ms),
next: ZSCAN xxx, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 18, in-pipe: 0, out-pipe: 0,
serverEndpoint: xx.euc1.cache.amazonaws.com:6379,
mgr: 10 of 10 available,
IOCP: (Busy=0,Free=1000,Min=2,Max=1000),
WORKER: (Busy=37,Free=32730,Min=2,Max=32767), v: 2.1.0.1
infos:
Endpoint: AWS ElastiCache Redis
.NET Core App 2.2.107
AWS configuration of Redis says "Encryption in-transit: No"
The metrics on the server seemed normal.
i hope that would helpful, thanks
We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?
.NET Framework Web Projects
In
web.config, either explicitly setting<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />or, something less than 4.5 in
<httpRuntime>(alsoweb.config):<httpRuntime targetFramework="4.x" />.NET Framework Non-web Projects
In
app.config, something less than 4.5 in<supportedRuntime>:<configuration> <startup> <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/> </startup> </configuration>To clarify: it doesn't matter what your
.csproj(or other) target framework is for this case. These.configflags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.If this matches anyone here, it'd be _hugely_ helpful to know. @mgravell is working on a workaround for this scenario.
@NickCraver, Just wanted to clarify here:
Is the following fine in my App.config?
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2" />
According to this Microsoft documentation, any .Net Framework version from 4.0-4.8 should use "v4.0"
Your comment, which had
<supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
seemed to imply that we could have a value greater than v4.0 for "version"
Hi there,
Also getting similar error. This is running on a azure function. Built using .net core 2.2 using version 2.1.39
System.Exception: Error fetching customer tags for devices 'xxxx'. ---> StackExchange.Redis.RedisTimeoutException: Timeout performing EXISTS (5000ms), next: EXISTS xxxxxxx, inst: 1, qu: 0, qs: 16, aw: False, rs: ReadAsync, ws: Idle, in: 850, serverEndpoint: tttttt:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxx, IOCP: (Busy=16,Free=984,Min=6,Max=1000), WORKER: (Busy=0,Free=32767,Min=6,Max=32767), v: 2.1.39.39788 (Please take a look at this article for some common client-side issues that can cause timeouts:
https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImplT in /_/src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.KeyExists(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 668
Any help appreciated.
Cheers,
Vini
I get below errors where try to to get or set a big data (about 70 MegaByte). any idea? can i increase redis timeout?
I use redis on windows.
StackExchange.Redis version: 2.1.58
what framework exactly? .NETFramework 4.6.1
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1" />
is there TLS involved? no
what is the main "rs" value in the error log? TryParseResult
was the server actually jammed at the time? no
Timeout performing GET (5000ms), next: GET 8a5d01ea-6017-4110-a1f7-42f63e309ad3, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 81920, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 0 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Timeout performing GET (5000ms), next: GET 9b181d9d-98dc-488a-998c-aded191c2e5d, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 61027, in-pipe: 83549, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 3 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Timeout performing GET (5000ms), next: GET 2db678e5-4a12-44be-98fd-173cd64fa0cd, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 147456, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 2 of 10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=2,Free=998,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
This issue lives on - 2 years later. @CDargis was definitely onto something with: "We only experience this when running many task / with (Task.WhenAll())" . In my case we see this when doing 1,000s of GETs while using Parallel.ForEach. Setting MaxDegreeOfParallelism = Environment.ProcessorCount was a big help - but does not fix everything. Has anyone else found a better solution?
Active issue for 2 years now.
Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.
Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.
@mgot90
We are currently struggling with this. We have a process using the cache and redis transactions. When scaling the process to handle many concurrent requests we hit the timeouts almost immediately. Can you provide more info on the fix that you discovered? The only fix that we have found is to limit concurrent requests to redis, this defeats the purpose of the multiplexing and we take a huge hit with performance.
Having the same exception without any obvious reason.
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5641ms elapsed, timeout is 5000ms), command=GET, next: GET <MyKey>, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/<myredis>.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0009DFC8C24D, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=3,Free=8188,Min=50,Max=8191), Local-CPU: unavailable, v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
I set both IncludePerformanceCountersInExceptions and IncludeDetailInExceptions to true, but still no CPU information.
@jeffras I'm just another dev trying to fix the same problem. Here's what we did:
In the end, we still have this happening every now and then. We're using Azure for our Redis provider. I haven't ruled out that our pricing tier is also related: https://azure.microsoft.com/en-us/pricing/details/cache/
Not sure if it's useful to you, but here is the Azure best practices as well: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices
Did you guys have luck or try anything different?
@mgot90 Thanks for the info, did you modify the stack exchange code for IDatabaseAsync? I did see it in the source but didn't want to maintain a branch to accomplish this. We still have the problem. We ended up having to wrap our code in a lock and it went away, but our throughput clearly suffers.
We've been dealing with these kind of errors since months (if not even years). Three months ago we've decided to reimplement our caching layer with following design assumptions:
I'm aware that this approach might not be ideal for everyone and has some performance footprint (as you need to check the size) but at least may shed some more light.
Our setup:
Most helpful comment
We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).
I've noticed the errors come in three flavors. Here are some representative samples:
RedisTimeoutException:RedisConnectionException: No connection is available to service this operation:RedisConnectionException: SocketFailure:We have hundreds of containers in production all talking to the same ElastiCache cluster, but these connection errors only occur on one host at a time. I.e., it's not a general Redis outage, it's something localized to a specific container.