Charts: redis-ha fails to work with redis 3 or 4 (default is 2.8)

Created on 19 Oct 2017  路  14Comments  路  Source: helm/charts

Is this a request for help?:
Yes.


Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug Report

Version of Helm and Kubernetes:

$ helm version
Client: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}
聽$聽kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Which chart:

stable/redis-ha

What happened:

redis-ha fails if redis version is 3+

$聽helm install --set replicas.master=1 --set replicas.slave=2 --set redis_image=launcher.gcr.io/google/redis3 --name=redis-test stable/redis-ha
NAME:   redis-test
LAST DEPLOYED: Thu Oct 19 16:21:09 2017
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME                 CLUSTER-IP     EXTERNAL-IP  PORT(S)    AGE
redis-sentinel       100.71.25.154  <none>       26379/TCP  0s
redis-test-redis-ha  100.66.244.56  <none>       6379/TCP   0s

==> v1beta1/Deployment
NAME                          DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
redis-test-redis-ha           2        2        2           0          0s
redis-test-redis-ha-sentinel  3        3        3           0          0s

==> v1beta1/StatefulSet
NAME                        DESIRED  CURRENT  AGE
redis-test-redis-ha-master  1        1        0s


NOTES:
Redis cluster can be accessed via port 6379 on the following DNS name from within your cluster:
redis-test-redis-ha.default.svc.cluster.local

To connect to your Redis server:

1. Run a Redis pod that you can use as a client:

   kubectl exec -it redis-test-redis-ha-master-0 bash

2. Connect using the Redis CLI:

  redis-cli -h redis-test-redis-ha.default.svc.cluster.local

$聽kubectl get pods
NAME                                        READY     STATUS             RESTARTS   AGE
redis-test-redis-ha-1260145999-2km73        1/1       Running            0          9s
redis-test-redis-ha-1260145999-52wzc        1/1       Running            0          9s
**redis-test-redis-ha-master-0                1/2       CrashLoopBackOff   1          9s**
redis-test-redis-ha-sentinel-439561-497l3   1/1       Running            0          9s
redis-test-redis-ha-sentinel-439561-8gp0c   1/1       Running            0          9s
redis-test-redis-ha-sentinel-439561-tmktm   1/1       Running            0          9s

$ kubectl logs redis-test-redis-ha-master-0 sentinel
1:C 19 Oct 20:21:15.318 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 19 Oct 20:21:15.319 # Creating Server TCP listening socket *:6379: bind: Address already in use

What you expected to happen:
work as it does with redis 2.8

How to reproduce it (as minimally and precisely as possible):
run commands I pasted above

Anything else we need to know:
no

lifecyclrotten

Most helpful comment

@esvirskiy i added the support for redis 4.0.2

docker pull quay.io/smile/redis:4.0.2

This image uses alpine:edge version
https://github.com/smileisak/docker-images/blob/redis-4.0.2/redis/alpine/Dockerfile

I also tested this image with redis-ha chart and it works fine !

helm install --set replicas.master=1 --set replicas.slave=2 --set redis_image=quay.io/smile/redis:4.0.2 --name=redis-test stable/redis-ha

NAME                                            READY     STATUS    RESTARTS   AGE
redis-test-redis-ha-293707591-ljt3t             1/1       Running   1          10m  
redis-test-redis-ha-293707591-sd4f0             1/1       Running   1          10m
redis-test-redis-ha-master-0                    2/2       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-70khz   1/1       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-fgscm   1/1       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-kwmsl   1/1       Running   0          10m

I hope that you are happy :dancing_men:

All 14 comments

@esvirskiy redis-ha chart uses a special redis image that can run in either master, slave or sentinel mode depending on the environment variables passed in.

You can use this image to bootstrap redis-ha cluster using version 3.2 of redis

docker pull quay.io/smile/redis:3.2.8

Dockerfile : https://github.com/smileisak/docker-images/tree/master/redis/3.2.8-alpine

@smileisak Thanks so much! Is 3.2.11 available?

@esvirskiy i added the support for redis 4.0.2

docker pull quay.io/smile/redis:4.0.2

This image uses alpine:edge version
https://github.com/smileisak/docker-images/blob/redis-4.0.2/redis/alpine/Dockerfile

I also tested this image with redis-ha chart and it works fine !

helm install --set replicas.master=1 --set replicas.slave=2 --set redis_image=quay.io/smile/redis:4.0.2 --name=redis-test stable/redis-ha

NAME                                            READY     STATUS    RESTARTS   AGE
redis-test-redis-ha-293707591-ljt3t             1/1       Running   1          10m  
redis-test-redis-ha-293707591-sd4f0             1/1       Running   1          10m
redis-test-redis-ha-master-0                    2/2       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-70khz   1/1       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-fgscm   1/1       Running   0          10m
redis-test-redis-ha-sentinel-3261088857-kwmsl   1/1       Running   0          10m

I hope that you are happy :dancing_men:

@smileisak Thanks 馃憤

Testing this now.

why put one sentinel into to "mater" statefulset?

@lqzmforer Sentinel container shares master's pod because it needs to know how to find the master. subsequent sentinels just ask the first sentinel, Because all containers in a Pod share a network namespace, the sentinel can simply look at $(hostname -i):6379.

@smileisak could you take a look at this as well? https://github.com/kubernetes/charts/pull/2529

Basically, i want to create 2 releases but the service name is already taken.

@tuananh Just fix conflicts ;)

@smileisak thank you for your kind reply.
but here is an another question: how could you deal with a situation where the master's pod is killed (or down)? In that situation, the killed master pod will be restarted again due to k8s replication controller (or statefulSet), but with a changed POD IP it's master container cannot be be set to "slave" role by sentinels.
when I deplyed a redis-HA cluster using redis-3.0, the above question confuse me a lot.

Hope I was clear.

@lqzmforer here is another scenario:
When I killed the master pod from the k8s dashboard, a failover will occur and one of the slave will be promote to be the master and all the slave will follow him. meanwhile a new master container will be create because of the k8s replication controller and he will be a master as well. how can I solve this (the 2 different masters)? I'm using image "quay.io/smile/redis:4.0.2"

@tal130 That's the probelm!
if there is only one-master-one-slave for a shard, data loss will appear in spite of an auto-recovery mechanism.
the root cause is that the restarted mater pod' ip has changed, but sentinels cannot detect it, even through k8s "service name".

now, I'm using liveness probe to discover monitored master's satus

@smileisak how to solve these two scenarios above?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Was this page helpful?
0 / 5 - 0 ratings