Origin: Replication Controller with static number of replicas

Created on 19 Jun 2015 · 12Comments · Source: openshift/origin

Some middleware applications are not ready to scale either up or down.
It is important to have a replication controller, that once defined, can't have its number of replicas changed.

Source

rettori

All 12 comments

Right now, nothing modifies the replica count of a replication controller at rest except direct user action. Any auto-scaling features added in the future would be able to be turned on or off for a given replication controller.

That said, wouldn't such an application also have issues with a pod going away and getting automatically replaced as well? A fixed number doesn't guarantee the same pods run forever. Pods can die, become unhealthy, or get evacuated from a node, and the resulting destroy/recreate looks a lot like scale down/up from an application perspective.

liggitt on 19 Jun 2015

Also, during an updated deployment, an old replication controller's pods are removed and a new replication controller's pods are added. Again, that looks a lot like a scale down/up from an application perspective.

liggitt on 19 Jun 2015

Some middleware applications require that you specify the number of node that will be available, and also their ip/ports. ActiveMQ is one, for example. All the nodes on the cluster need to be aware of each other and this is done via a static configuration. Same for HornetQ. One you have added nodes to a cluster, you cannot remove nodes, or messages will be lost.
As for with your comment wrt pods going away, the expectation is that the same volume (persistent store) get mounted back bringing the pod back to the state it was before.
For the pod evacuation, it will look just like bringing the pod down, and then up again. Pod will be started on another node, but the persistent store mounted back.

The static number of replicas configuration will be a common aspect of a few applications, so I recommend that this feature get added.
I believe the best recommendation is to activate a policy against changing it rather than creating a 'special rc that only handles static # of replicas'.

rettori on 19 Jun 2015

👍1

What you're describing is the nominal service, which controls membership functions. It would be more appropriate to track this "unique identity" with that.

smarterclayton on 22 Jun 2015

closing based on assumption we would never implement this feature.

bparees on 20 Jul 2015

👎2

As far as I can tell is the rc the only object that will spawn a new pod when an old pod crashes.
The issue with using an rc is that it also enables scaling, while there are services that need recovery but not scaling.
It is true that a human action is needed in order to scale up or down, but that fact that the up and down arrows are displayed implies that the service can handle it. It send the wrong message to end users.

So what would be the best way to define a pod that gets automatically recreated when it crashed, but can't be scaled?

karelstriegel on 8 Apr 2016

it might be nice to have a scaling-disabled annotation on an RC which the UI and CLI would use to reject scaling requests (or use to preclude showing the arrows). @smarterclayton @jwforres ?

bparees on 8 Apr 2016

(or it could be indicated on the image or imagestream and inherited by the RC based on the image the podtemplate is referencing)

bparees on 8 Apr 2016

We've established we would not block scaling at the API - it would only be
a client recommending that it not be scaled. At best, UIs can indicate
this item _desires_ not to be scaled outside of a range, but UIs cannot
prevent mutation if the user confirms.

On Fri, Apr 8, 2016 at 9:49 AM, Ben Parees [email protected] wrote:

(or it could be indicated on the image or imagestream and inherited by the
RC based on the image the podtemplate is referencing)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3345#issuecomment-207439250

smarterclayton on 8 Apr 2016

@smarterclayton so bottom line is if you use a rc, your service should be scalable? I can agree with that. Then for all non scalable services one could use a basic pod definition, but what is the pod crashes? A manual action is mandatory.

An example, a PostgreSQL database server with persistent storage, one wants to have it 24/7 online, so if the pod crashed a new pod should be spawned, but scaling from one pod to 2 or more could corrupt your data.

karelstriegel on 8 Apr 2016

There are a number of controllers today:

Replication controllers - deals with fungible resources (things that can
come and go on demand)
DaemonSets - pods that live on particular nodes (their identity can come
from their node's name or location or disk)

Replication controllers don't even guarantee that only one pod exists when
scale = 1. An RC creates a new pod as soon as the old pod _starts_
terminating, which means if you are expecting RCs to give you "only one"
then you're already broken. Use a daemon set and label your nodes to make
the node special, then ensure you don't have duplicate nodes. You can kind
of fake the "only one" today by using a persistent volume that has built in
locking (AWS, GCE, Ceph block), because the new pod won't be able to start
until the old pod gives up the volume (a crude form of fencing).

The longer term solution is called PetSets (working name) which will give
you more guarantees that you only get exactly N of something. However,
even in that case you'll need to understand the guarantees around disk
storage.

On Fri, Apr 8, 2016 at 10:08 AM, Karel Striegel [email protected]
wrote:

@smarterclayton https://github.com/smarterclayton so bottom line is if
you use a rc, your service should be scalable? I can agree with that. Then
for all non scalable services one could use a basic pod definition, but
what is the pod crashes? A manual action is mandatory.

An example, a PostgreSQL database server with persistent storage, one
wants to have it 24/7 online, so if the pod crashed a new pod should be
spawned, but scaling from one pod to 2 or more could corrupt your data.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3345#issuecomment-207445767