Stackexchange.redis: System.Timeout exception .netcore

Created on 18 Mar 2017  ·  15Comments  ·  Source: StackExchange/StackExchange.Redis

Hi,

I'm running .net core app 1.1 and with latest StackExchange.Redis package and when I run my regular load test on the API, I've been getting a lot of these...

fail: Microsoft.AspNetCore.Server.Kestrel[13]
      Connection id "0HL3E84SBU8HC": An unhandled exception was thrown by the application.
System.TimeoutException: Timeout performing GET AccessToken06a49c0e-ba25-4734-bdfb-a323010d895b, inst: 8, queue: 12, qu: 0, qs: 12, qc: 0, wr: 0, wq: 0, in: 1386, ar: 0, clientName: WEB01, serverEndpoint: 127.0.0.1:6379, keyHashSlot: 2769 (Please take a look at this article for some common client-side issues that can cause timeouts: https://github.com/StackExchange/StackExchange.Redis/tree/master/Docs/Timeouts.md)
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags)
   at Game.Data.Repositories.CacheRepository.GetObject[T](String key) in 
  • Both API and Redis are running on the same machine.
  • This happens only when I try to simulate more then 20ish concurrent clients on the API.
  • Load on the server is basically non-existent.
  • More then 10gigs of memory is available
  • Everything is running on windows 2012 R2 x64
  • Redis is running as a windows service with vanilla config
  • At the time of the test there are probably just couple of objects in redis database.
timeout

Most helpful comment

In the test case provided (the synchronous test case), this seems to be due to starvation of synchronous Monitor.Wait invocations, eventually causing no thread pool thread to be created for completion.

In a slightly modified test case:

using System;
using System.Threading.Tasks;
using StackExchange.Redis;

namespace ConsoleApp3
{
    public class Program
    {
        private static IDatabase _db;
        public static void Main(string[] args)
        {
            Init();
            var key = Guid.NewGuid().ToString();
            _db.StringSet(key, Guid.NewGuid().ToString());

            try
            {
                Parallel.For(0, 10000, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, i =>
                {
                    var start = DateTime.Now;
                    _db.StringGetAsync(key).Wait();

                    Console.WriteLine($"Elapsed: {(DateTime.Now - start).TotalMilliseconds}ms");
                });
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
        }

        private static void Init()
        {
            var redis = ConnectionMultiplexer.Connect("localhost");
            _db = redis.GetDatabase();
        }
    }
}

we however get very random slowdowns (as shown in https://github.com/StackExchange/StackExchange.Redis/issues/599 as well) - ranging from 10ms to >1000ms(!), where the desktop CLR just runs <1ms at all times.

This seems to be related to the handling of ReadAsync in PhysicalConnection.cs - changing to synchronous reads (there's only one connection in this simple test case) and changing ISocketCallback.StartReading to run the synchronous BeginReading call in a task, we get performance comparable to the desktop CLR in this test case, again pointing to thread pool starvation in handling ReadAsync completion - presumably waiting for the Task.Wait result (which, indeed, is not recommended by typical .NET conventions) in this contrived sample.

All 15 comments

Here's a very simple console app that behaves exactly the same

https://gist.github.com/talfirevic/16050d9b036e3fa072260588a3c8abba

I have the same problem.

@talfirevic I actually worked around this by setting the min threads via
set COMPlus_ThreadPool_ForceMinWorkerThreads=1000

@bcuff I see. If that actually helped you then I'd suggest doing this via runtimeconfig.json.
IIRC dotnetCore team did expose those settings on the API level.

Found it... https://github.com/dotnet/cli/blob/rel/1.0.0/Documentation/specs/runtime-configuration-file.md

But... This looks like a band-aid type of solution for this issue. Having minimum of 1000 of threads feels like only postponing the same issue to a point when minimum of xxx threads is not enough. Also, having _a lot_ of threads brings it's own penalty.

I feel like there's more elegant solution to this. Maybe... I haven't had any time to actually look into the source.

Nonetheless thanks for the suggestion. :)

yup. it's a total band-aid. I would rather the deadlock just not be possible.

We're (@citizenfx) having the same issue, and even with 2000 worker threads we just deadlock all threads after only 5 minutes of running our application... will this get fixed at all?

In the test case provided (the synchronous test case), this seems to be due to starvation of synchronous Monitor.Wait invocations, eventually causing no thread pool thread to be created for completion.

In a slightly modified test case:

using System;
using System.Threading.Tasks;
using StackExchange.Redis;

namespace ConsoleApp3
{
    public class Program
    {
        private static IDatabase _db;
        public static void Main(string[] args)
        {
            Init();
            var key = Guid.NewGuid().ToString();
            _db.StringSet(key, Guid.NewGuid().ToString());

            try
            {
                Parallel.For(0, 10000, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, i =>
                {
                    var start = DateTime.Now;
                    _db.StringGetAsync(key).Wait();

                    Console.WriteLine($"Elapsed: {(DateTime.Now - start).TotalMilliseconds}ms");
                });
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
        }

        private static void Init()
        {
            var redis = ConnectionMultiplexer.Connect("localhost");
            _db = redis.GetDatabase();
        }
    }
}

we however get very random slowdowns (as shown in https://github.com/StackExchange/StackExchange.Redis/issues/599 as well) - ranging from 10ms to >1000ms(!), where the desktop CLR just runs <1ms at all times.

This seems to be related to the handling of ReadAsync in PhysicalConnection.cs - changing to synchronous reads (there's only one connection in this simple test case) and changing ISocketCallback.StartReading to run the synchronous BeginReading call in a task, we get performance comparable to the desktop CLR in this test case, again pointing to thread pool starvation in handling ReadAsync completion - presumably waiting for the Task.Wait result (which, indeed, is not recommended by typical .NET conventions) in this contrived sample.

Okay, we can confirm making reads on the CoreCLR non-async improves the performance issues we were having in production.

See our patch.

EDIT: nope, this _still_ breaks as badly.

Same here,
I'm developing a high scale infrastructure for our new system. Tech stack is ASP.NET Core, I always get timeout errors when I try to send apprx ~50 concurrent requests. Then I did little research, I couldn't find any solution. So I decided to change my way, normally Redis db and connection should be singleton in a project, however in my solution, I made the connection scoped and I created every new redis db instances for every new request, then setting timeout duration in redis-cli solved my problem.
But again, IT IS Bad Practice and costly.
I'm looking for a new solution.

@NickCraver have you seen any of these timeouts / issues while running on asp.net core? All of these issues I'm coming across seems like the same issue. We have two different asp.net core apps using redis for caching / message bus and both are crapping out under load

@niemyjski yeah I can confirm the error is the same as I initially reported. But that is to be expected as Microsoft.Extensions.Caching.Redis.Core is basically a wrapper around this library, and stack trace points to the same place, as reported in this issue. Could be worth taking a decent look as I've seen a lot of these timeout related issues, which look the same.

I wouldn't even say it's crapping "under load", something with concurrency is creating issues here as simple parallel.for will make it explode (code sample is in the gist I made in earlier comments)

Folks, we are using Microsoft.Extensions.Caching.Redis and getting the time outs too frequently. Do we have a workaround or a fix that we can apply to get around this issue?

1.2.7 alpha still has this problem, anyone on this?

Some info that will help:
What timeout are people using? And which version of .NET Core are you on?

Closing; please see #871

Was this page helpful?
0 / 5 - 0 ratings