We are working on building out a second data center for our applications, and although there seem to be plenty of options we aren't quite sure how we want to handle Redis and Sidekiq. Your opinions and experience would be appreciated.
Thanks!
Sidekiq's HA story is Redis's HA story, for better or worse. I don't have any suggestions beyond what the Redis docs will cover.
Redis itself was reliable enough that we were able to run it in a simple primary/replica setup with 100% uptime for the 2.5 years I managed it.
I agree that both Sentinel and Clustering are overly complex. In the scenario of using just redis with primary/replica setup, do you depend on sidekiq to know which redis is the primary and which redis is the secondary? Or did you manually failover you application using a configuration change and redeploy of code? If I'm misinterpreting your message above, please let me know.
I also concur that redis rarely fails, so we find that all this research is for the smallest of cases.
The plan was to use DNS failover. If we wanted to promote a replica, we'd take down the primary, issue slaveof no one to the replica, update the primary hostname to the replica's IP and after the 30 second DNS cache expired, everyone should be pointing to the new primary. We had a script which automated all the steps, the only manual step was determining when to run it. As I said, we tested the script but never actually had to use it.
Honestly this is why we use Redis Labs. They make all of this 'just work' with their special sauce version of redis HA. They have a locally installable version.
+++ for Redis Labs, its very cheap, super performant and we absolutely crush it with Sidekiq threads (over 5000 concurrent routinely) without ever having a single second of downtime in over 3+ years. We've processed 15 billion+ Sidekiq jobs with no issues and their HA is awesome!
Thanks for the responses everyone. We're currently testing an active/passive setup among data centers where the passive data center is comprised of read-only replicas using a slave-priority of zero to prevent unintended failovers. This would require some manual failover (similar to changing a DNS pointer, but also more complicated). At this point, we depend highly on sentinel with many of our applications, so using DNS as a pointer to our master redis instance would take some major reconfiguration.
In talking with @mikegee, he noted that sidekiq may have problems with timeouts while writing/reading across network partitions, so that will be something we test heavily.
I also appreciate the great reviews of the Redis Labs product. We'll have to look into that.
Most helpful comment
Honestly this is why we use Redis Labs. They make all of this 'just work' with their special sauce version of redis HA. They have a locally installable version.