Allow StatefulSet to use the other 2 restart policies: "OnFailure" and "Never".
It enables to deploy "run-once + strong identity" cluster apps using StatefulSet.
In machine learning, for training a model in distributed way, we define a bunch of ordered stateful container workers, e.g. "worker-0" "worker-1" ... "worker-n". (They use numbers as their suffix. StatefulSet can manage these continuous numbers for us.)
When training begins, these containers talk to each other by their DNS records. (StatefulSet can help us to create their DNS records.)
When the training is done, all the workers save the result into their persistent volumes and exit 0. (StatefulSet helps us to manage their PVs.)
Then, all the stateful Pods stop there in success state, and we still can check their logs.
This is a suitable scene to use StatefulSet, and it is really helpful to allow "OnFailure" and "Never" RestartPolicy in StatefulSet.
In the original stateful-apps proposal, it says,
"StatefulSets: Running pods which need strong identity and storage".
The word "running" here, is a verb, which means to run something, not to keep something always in a running state.
BTW, it also says,
"StatefulSets are considered easy to use for deploying clustered software for common cases."
A run-once stateful cluster is also a common case.
In the original design of StatefulSet, it only allows the RestartPolicy "Always".
If we can allow the RestartPolicy "OnFailure" and "Never", the Pods in StatefulSet can stay at their final states when their tasks are done, instead of restarting over and over again.
After implementing this proposal, the default value of restart policy is still the "Always", so that it won't affect the existing StatefulSets. If a user wants to use RestartPolicy "OnFailure" and "Never" in StatefulSet, they have to explicitly set the value, which means that they clearly know what they want to do with their apps.
Modify the method isRunningAndReady in StatefulSet.
Adding something like this:
// Determine alive or not based on its restart policy
switch pod.Spec.RestartPolicy {
case v1.RestartPolicyNever:
// RestartPolicyNever treats all the other phases as alive
case v1.RestartPolicyOnFailure:
if phase != v1.PodRunning && phase != v1.PodSucceeded {
return false
}
default:
if phase != v1.PodRunning {
return false
}
}
I had made a PR at kubernetes/kubernetes#42892 of implementing this, and @kow3ns suggested me to have a discussion in community before supplying a PR.
This seems to be an enhancement to stateful-apps, should I make a PR directly on "stateful-apps", or open a new propose, e.g. "stateful-apps-restartpolicy"? What do you think?
/cc @smarterclayton @bprashanth @kargakis @kow3ns @thockin
I don't feel comfortable adding run-once semantics in a controller that is designed to run long-running processes. @soltysh @erictune is this a use case for indexed Jobs?
https://github.com/kubernetes/kubernetes/issues/14188
cc: @kubernetes/sig-apps-feature-requests
@soltysh @erictune is this a use case for indexed Jobs?
That's what it looks to me. Although, IndexedJobs will be limited in comparison with StatefulSet, esp. the storage part, but the stable identity is a key req for it.
@kargakis @soltysh
I'm so glad that it has been designed before. This is my first time to hear about Indexed Job.
The --restart=OnFailure flag and the $INDEX environment variable are fantastic and useful!
I found that kubernetes/kubernetes#14188 has been shelved for over a year. Is there anyone working on or planning to implement this? I would like to help to implement this.
Are the workers (1) cooperating to solve a single problem instance, e.g. train a single model? Or are they (2) doing parallel work, e.g. train and evaluate a model 20 times in parallel, each with different values of a hyperparameter?
If it is (1), then I'd suggest you just use a statefulSet, and have an outer control loop delete the statefulSet once the problem is done.
If it is (2), then the best current option is to just generate lots of Jobs using a script.
@erictune
It is (1). The workers cooperationg to train a single model.
The workers mainly need 2 parameters: "a whole cluster structure" and "index of itself".
The whole cluster structure: this parameter is all the same among all the workers.
The index: this paremeter is different. Each worker need to use the index number to know which part to take. e.g. --index=$INDEX
For a distributed TensorFlow case, the commands look like this:
Code are copied from the docs https://www.tensorflow.org/deploy/distributed
There are two types of Indexed Job or StatefulSet, "ps" and "worker".
# On ps0.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=ps --task_index=0
# On ps1.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=ps --task_index=1
# On worker0.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=worker --task_index=0
# On worker1.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=worker --task_index=1
Your advice is acceptable, but one thing, the logs.
The problem is, when the containers exit 0, they are restarted immediately. Their logs are lost. If the container images are provided by end-users, I cannot easily change their entrypoints or commands to redirect the stdout into a file. I have to build a logging system to collect container logs in live.
I know there are many work-around ways to build the above cluster, and actually I already did that.
Through this proposal, I want to discuss more about whether we can make it easier with the community. :)
I do want to reiterate from the previous discussion that
Right now the restart policy of all three controllers reflects the design intention and is consistent. If we change the semantics of StatefulSet, we make it inconsistent, and, as we evolve the controller to better suite its design intentions, we have to always consider the implications with respect to the "StatefulJob" use case of dealing with completing processes. In the 1.7 time frame, this will make status tracking and container update more difficult for users to reason about and for developers to implement.
StatefulSet was never intended to run all use cases, and imo, if their is a use case for ordered StatefulJobs that can not be met with StatefulSet, this should be implemented as an extension whose implementation and evolution is specifically focused on supporting that use case.
Yes, probably TPR is what you want to use:) Although TPRs are not ideal
today
On Thu, Mar 16, 2017 at 1:32 PM, Kenneth Owens notifications@github.com
wrote:
I do want to reiterate from the previous discussion that
- ReplicaSets are meant to make sure a specific number of Pods are
running at a particular time
https://kubernetes.io/docs/user-guide/replicasets/#when-to-use-a-replicaset- DeamonSets are meant to make sure that all (or some) Nodes run a
copy of a Pod
https://kubernetes.io/docs/admin/daemons/#what-is-a-daemonset- StatefulSets are meant to make sure that a specific number of unique
copies of a Pod are running
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/stateful-apps.md
.Right now the restart policy of all three controllers reflects the design
intention and is consistent. If we change the semantics of StatefulSet, we
make it inconsistent, and, as we evolve the controller to better suite its
design intentions, we have to always consider the implications with respect
to the "StatefulJob" use case of dealing with completing processes. In the
1.7 time frame, this will make status tracking and container update more
difficult for users to reason about and for developers to implement.StatefulSet was never intended to run all use cases, and imo, if their is
a use case for ordered StatefulJobs that can not be met with StatefulSet,
this should be implemented as an extension whose implementation and
evolution is specifically focused on supporting that use case.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/458#issuecomment-287133219,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADuFf5mq7Sgk0E4tshJm3ZeB5UOSz0cIks5rmXIpgaJpZM4MdpSz
.
Yeah, "StatefulJobs" is how we would bucket this. Our controller types (Deployment, RS, StatefulSet) are not for "jobs".
Agree. After the discussion here, I think the "Indexed Job" or "StatefulJob" may be the best solution.
@ohmystack with the Indexed Job being tracked in https://github.com/kubernetes/kubernetes/issues/14188, are you ok with closing this in favor of it?
@soltysh OK. I'll close this. Let's track the Indexed Job.
Most helpful comment
I do want to reiterate from the previous discussion that
Right now the restart policy of all three controllers reflects the design intention and is consistent. If we change the semantics of StatefulSet, we make it inconsistent, and, as we evolve the controller to better suite its design intentions, we have to always consider the implications with respect to the "StatefulJob" use case of dealing with completing processes. In the 1.7 time frame, this will make status tracking and container update more difficult for users to reason about and for developers to implement.
StatefulSet was never intended to run all use cases, and imo, if their is a use case for ordered StatefulJobs that can not be met with StatefulSet, this should be implemented as an extension whose implementation and evolution is specifically focused on supporting that use case.