Sidekiq: [Question] Do you have any advice for application/workers spanning multiple data centers?

Created on 31 May 2016  路  6Comments  路  Source: mperham/sidekiq

We are working on building out a second data center for our applications, and although there seem to be plenty of options we aren't quite sure how we want to handle Redis and Sidekiq. Your opinions and experience would be appreciated.

Thanks!

Most helpful comment

Honestly this is why we use Redis Labs. They make all of this 'just work' with their special sauce version of redis HA. They have a locally installable version.

All 6 comments

Sidekiq's HA story is Redis's HA story, for better or worse. I don't have any suggestions beyond what the Redis docs will cover.

  1. Sidekiq doesn't keep a lot of data in Redis long-term, the data is "consumed" once the jobs have been processed. When I ran Sidekiq in production, our Redis was about 6MB in size with empty queues. However based on your own job throughput, you might see many MB/sec in jobs created/destroyed.
  2. Redis does offer asynchronous replication for use between data centers.
  3. You can turn off disk persistence to minimize the risk of disk failure, instead relying on replication.
  4. I found Redis Sentinel and Cluster overly complex for our needs. In general I don't trust distributed solutions using clusters and consensus - they are extremely difficult to get correct in all error cases and often cause more trouble than they aim to solve.

Redis itself was reliable enough that we were able to run it in a simple primary/replica setup with 100% uptime for the 2.5 years I managed it.

I agree that both Sentinel and Clustering are overly complex. In the scenario of using just redis with primary/replica setup, do you depend on sidekiq to know which redis is the primary and which redis is the secondary? Or did you manually failover you application using a configuration change and redeploy of code? If I'm misinterpreting your message above, please let me know.

I also concur that redis rarely fails, so we find that all this research is for the smallest of cases.

The plan was to use DNS failover. If we wanted to promote a replica, we'd take down the primary, issue slaveof no one to the replica, update the primary hostname to the replica's IP and after the 30 second DNS cache expired, everyone should be pointing to the new primary. We had a script which automated all the steps, the only manual step was determining when to run it. As I said, we tested the script but never actually had to use it.

Honestly this is why we use Redis Labs. They make all of this 'just work' with their special sauce version of redis HA. They have a locally installable version.

+++ for Redis Labs, its very cheap, super performant and we absolutely crush it with Sidekiq threads (over 5000 concurrent routinely) without ever having a single second of downtime in over 3+ years. We've processed 15 billion+ Sidekiq jobs with no issues and their HA is awesome!

Thanks for the responses everyone. We're currently testing an active/passive setup among data centers where the passive data center is comprised of read-only replicas using a slave-priority of zero to prevent unintended failovers. This would require some manual failover (similar to changing a DNS pointer, but also more complicated). At this point, we depend highly on sentinel with many of our applications, so using DNS as a pointer to our master redis instance would take some major reconfiguration.

In talking with @mikegee, he noted that sidekiq may have problems with timeouts while writing/reading across network partitions, so that will be something we test heavily.

I also appreciate the great reviews of the Redis Labs product. We'll have to look into that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrewhavens picture andrewhavens  路  4Comments

paul-ylz picture paul-ylz  路  4Comments

michaeldiscala picture michaeldiscala  路  4Comments

agrobbin picture agrobbin  路  4Comments

bartimaeus picture bartimaeus  路  3Comments