Stackexchange.redis: ServerSelectionStrategy set to standalone for a clustered cache

Created on 16 Jun 2020  Â·  15Comments  Â·  Source: StackExchange/StackExchange.Redis

Hi @mgravell ,
I think i might have found an issue where for a clustered cache mux can get into a bad state where requests start failing until mux is teared down or a reconnect occurs.
While debugging I observed that the cause is likely due to mux level serverselection strategy being standalone whereas serverendpoints have been detected as cluster. serverselect looks for AnyConnected which fails to return any endpoint since endpoint.servertype(which is cluster) doesn't match serverselectstratergy.servertype(with is standalone).
During my test, task faulted due to a connection drop when the ping was being sent. Connection got restored and reconfigure was called again but didn't proceed due to re-entrancy condition in reconfigureasync
(Reconfiguration was already in progress due to: connection failed, attempted to run for: connection failed)
I see in reconfigureAsync if the task is faulted it would result clustercount to be 0 which makes serverselectionstratergy to be set as standalone
https://github.com/StackExchange/StackExchange.Redis/blob/master/src/StackExchange.Redis/ConnectionMultiplexer.cs#L1762
https://github.com/StackExchange/StackExchange.Redis/blob/master/src/StackExchange.Redis/ConnectionMultiplexer.cs#L1852 (clustercount is 0)
Before proposing a fix, I wanted to check your opinion on this theory .

Thanks.

Most helpful comment

Hi @NickCraver @mgravell,

When can we expect a new version released on NuGet?

Hi @NickCraver @mgravell,

We really need to fix this issue in Production. Can we know the schedule for releasing?

All 15 comments

sounds interesting; it sounds like you were going to try a PR? if so: fine, otherwise I can add this to my backlog to look at (second in the queue, just under .NET Core latest failing)

I reported the same problem few months back and here is the PR which I submit to fix this problem but due to recent COVID-19 stuff I was not able to give time to it and it was marked as closed. Probably we might want to reconsider it.

https://github.com/StackExchange/StackExchange.Redis/pull/1444

@dceravigupta @mgravell
I am proposing a fix by making it skip setting the strategy when servertype was undetermined. Default will still remain as standalone. But during a reconnect if the task to determine servertype is faulted, it won't reset the strategy to standalone.
https://github.com/StackExchange/StackExchange.Redis/pull/1532

ping

Hi @mgravell , I have a pull request here #1532, can you review it. thanks.

Hoping for a decision sooner from @mgravell. Thank you.

Hi, @mgravell

We facing the problem for couple of months. It is a really big problem in production when occurred. All you can do is give up and logging the information for what you lose and retry it again. It's annoying and frustrated.

Could you please speed up for reviewing of the pull request and also publish new version for this fix. Very appreciate for your help and contribute.

Did a bit more investigation. My observation is that the "StackExchange.Redis.RedisConnectionException: No connection is available to service this operation" exception did not happen in the Nuget version 1.2.3 and Nuget version 1.1.603.
We performed a major upgrade to Nuget version 2.0.601 in June. That's when we started seeing this behavior in production.
I thought I will put it here, if it helps take the resolution move forward.
I am sure, Nuget 2.x was brought in a with a bunch of much sought after enhancements.
I hope this helps a bit.

Hi @mgravell , did you get a chance to review the PR and do you think something more needs to be done here to fix this issue? thanks.

I will try and have a look in the week, thanks

On Sat, 15 Aug 2020, 20:36 dv, notifications@github.com wrote:

Hi @mgravell https://github.com/mgravell , did you get a chance to
review the PR and do you think something more needs to be done here to fix
this issue? thanks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/StackExchange/StackExchange.Redis/issues/1501#issuecomment-674439270,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAEHMEB3ZH3OG23UJYT5FDSA3PSTANCNFSM4N7VEAEA
.

Hi @mgravell, we are another team at Microsoft also experiencing some of this and very much looking forward to the fix :)
Did you have a chance to look at the PR? 😇

@dnl-blkv ^^^

Thank you, @NickCraver, @mgravell

Hi @NickCraver @mgravell,

When can we expect a new version released on NuGet?

Hi @NickCraver @mgravell,

When can we expect a new version released on NuGet?

Hi @NickCraver @mgravell,

We really need to fix this issue in Production. Can we know the schedule for releasing?

Was this page helpful?
0 / 5 - 0 ratings