The new health indicator groups feature in Spring Boot 2.2 allows to create arbitrary health indicator groups. It would be nice if Spring Boot provided a default for those when running on Kubernetes.
A programmatic callback (customizer?) with a condition on running with Kubernetes could be provided. If no liveliness and readiness group exists, we could create a liveliness with "ping" only and a readiness with all the rest. That doesn't help customizing the roles and the details but perhaps such thing can be configured via properties.
After a quick discussion with the team, it seems we need to think about this more.
First, we need to consider making that feature available without actuator on the classpath - that would change our previous plan about leveraging health indicator groups if we decide to do so.
We also need to reconsider the actual checks made by each probe.
readiness
probes should tell whether the application is ready to receive traffic. In our case, we should:
ApplicationReadyEvent
when the application startsliveness
probes:
startup
probes:
In all cases, it could be useful to document/point to relevant documentation; each probe needs to be configured with a different spec (failure threshold, period).
@bclozel I am a Kubernetes reviewer and saw this issue on your side...
Can we work together on defining the best strategy for each probe?
For instance there is more than just setting ReadinessProbe
to true when you are ready - it can be used to give your application some breathing space when you process your queue.
The LivenessProbe
should fail when your application requires a reset - I have seen cases where that probe relies on a different Spring Context than the application, and continues to work even when the latter crashed after an OOM...
Please let me know how I can help document and implement these. Thanks!
Hi,
I just want to put some input for how we implemented our readiness and liveness.
It's pretty much similar to what @bclozel commented above.
Readiness
:
We have ReadinessEdnpoint
class that simply keep boolean value and also takes HealthIndicator
beans for readiness.(Built on spring-boot 2.1, so not using indicator group yet)
This class is an application context event listener that receives our ApplicationReadinessEvent
.
Initial value for this readiness bean indicates NOT_READY
because we do NOT want traffic until application is ready for serving.
Once application starts-up and bootstrapped necessary things, user fires ApplicationReadinessEvent
with value=READY
, then readiness starts returning value=READY
.
Application needs to decide when to issue readiness event(value=READY
) because being ready to serve traffic is upto the application to decide.
Another place that issues readiness event is our graceful shutdown logic.
When graceful shutdown logic is initiated (e.g: by receiving ContextClosedEvent
), first thing we do is to fire ApplicationReadinessEvent
with value=NOT_READY
. This will stop receiving any more requests while shutdown is in progress.
Liveness
:
Similar to readiness class, we have simple boolean to indicate LIVE/NOT_LIVE
.
The differences from readiness are initial value is set to LIVE
and no event is issued to change liveness status right after application is bootstrapped.
There are also consideration for initial delay and frequency(period) for readiness/liveness probes in k8s config. Once initial delay is passed, we check readiness more often than liveness. Also, this frequency(period) may affect duration for graceful shutdown.
Currently, our readiness and liveness are implemented as actuator Endpoint
but it is not necessary this way.
As long as there is a boolean value to keep the state and receive application context event, it can be a service bean. Then, if actuator is available, put it into a HealthIndicator
(HealthContributor
) and be part of each readiness/liveness health groups.
Application needs to decide when to issue readiness event(value=READY) because being ready to serve traffic is upto the application to decide.
I'm intrigued by this, @ttddyy. Thanks for sharing your thoughts. I had hoped that performing work during application context refresh and in application and command-line runners and the subsequent ApplicationReadyEvent
would be sufficient for indicating that the application was ready to start handling traffic. What's missing from the current events and startup sequencing that led to you issuing a separate event?
Hi @wilkinsona
When application becomes ready to serve traffic is not necessarily tied to ApplicationContext lifecycle.
If application is well behaved to fit in spring lifecycle, application developers would put initialization logic to [Command|Application]Runner
; however, it is not something we can enforce.
Also, application received ApplicationReadyEvent
doesn't always mean it is ready to serve traffic. It is more like it is ready to perform application logic. The ready to serve flag might depends on external resources. Application may connect to cache cluster upon ApplicationReadyEvent
and form the cluster and warm up local cache, then it becomes ready to serve traffic.
So, we think it is application's responsibility to decide when to flip the ready flag.
From spring-boot perspective, I think it is ok to set _ready=true_ at ApplicationReadyEvent
by default. But it needs a way to disable the default ready event and let users manually flip the ready flag; so that, application can determine the timing.
Netflix reports similar requirements and something similar is built into eureka.
If I can add to the debate...
Readiness
isn't a definite state and devs could use it to give some time off to the application instance to avoid overfilling queues and prevent aggravating states when processing time and traffic snowball into timeouts.
Liveness
should fail 100% and immediately when the application cannot recover and requires a kill. I don't know if your implementation ensures that.
Once the Startup
probe reaches GA, every probe will have a clear and separate meaning:
Startup=true
: my application has started and you can verify other probes.
Readiness=true
: my application is functioning properly, give me traffic.
Readiness=false
: my application cannot handle more traffic at the moment, please remove me from the load balancer pool.
Liveness=false
: my application is dead, please kill the container.
I don't see the need for another event in the lifecycle, nothing enforces a user to implement that new event either. We could recommend a user write a custom health contribution (HealthContributor
) so that whatever housekeeping that needs to be done at the start of an app feeds into the health endpoint. That is an existing mechanism that seems ideal for this use-case.
Nothing beats feedback and actual experience, so we're going to implement a first version of this with:
liveness
probe. As an opt-in feature, it should be using the ping
endpoint provided by Actuator. Besides a working server, Spring Boot doesn't know enough about internal application state to have opinions about this.readyness
probe with an ApplicationReadyHealthIndicator
that by default looks for ApplicationReadyEvent
and ContextClosedEvent
to change the state of the probe. Developers will be able to create their own instance and configure the event types to look for.readyness
health indicator should not be part of the default global health status; if it did, there would be no way to differentiate the application not accepting traffic from an unhealthy application (some platforms might just kill the app as a result).This first step heavily relies on the existing Actuator infrastructure; the only missing piece is whether we can easily exclude this new indicator from the default group.
With our current understanding, this approach has a few advantages:
"/actuator"
URL namespace, so it won't clash with other endpointsThere are some issues as well:
After experimenting with this and getting feedback from the community, we will improve/reconsider this approach. We could make it independent of Actuator. This requires more design, more infrastructure (MVC, WebFlux, Jersey, etc) and a separate URL path which might clash with existing routes.
@matthyx we're also wondering about the following:
/probes
?)Readyness
as a way to get some breathing space for the application and "avoid overfilling queues". Are you thinking about messaging queues, HTTP server connection queues, threadpools, all of the above? We would like some pointers to other libraries docs about the features they provide for this.@bclozel thanks for taking this subject seriously, I will reply first with the answers I know, and then after some research come back to the other points :-)
don't share 100% of the main application infrastructure. So the main app might fail and the probe would still keep working
Ok, this should be clearly stated in the documentation, and you should encourage users to implement their own liveness
probe in the main application loop instead. Too many times I have seen OOM killed application contexts with the probe still working.
are there conventions around the actual URL paths for those probes
No, but I can look around and report if I find something.
Does k8s route external requests to probes or should they be protected from Internet traffic?
No, but depending on the ingress used you could block them. Or it might be possible to bind the probes to a different port that is not exposed to the outside world via service/ingress.
Are you thinking about messaging queues, HTTP server connection queues, threadpools, all of the above?
Yes, absolutely. I think the feature is underused today because most of the people think that readiness
is a definite state (which it's not). I hope that with the startup
probe enabled by default (still beta though) in 1.18 will help clear this confusion.
I will do some research about probe usages and report their "default" paths, features and their implementations.
Nice one: https://github.com/nodeshift/kube-probe
Gitlab implementation (ruby on rails): https://gitlab.com/gitlab-org/gitlab-foss/-/blob/6a9d7c009e4e5975a89bcc3e458da4b3ec484bd1/spec/requests/health_controller_spec.rb
It is very important that all 3 probes should not depend on external dependencies as correctly stated here.
After several rounds of draft implementations, the team decided to go with the following.
First, "Liveness" and "Readiness" are promoted as first class application state. Their current state can be queried with the ApplicationStateProvider
, and Spring Boot also provides ways to track changes or update them using Spring Application Events. Spring Boot uses the state of the application context and startup phases to update these states in regular applications.
If Actuator is on the classpath, we then provide additional HealthIndicators
and create dedicated Health Groups at "/actuator/health/liveness"
and "/actuator/health/readiness"
. Developers can configure additional indicators under those health groups if they wish to.
In general, we're also adding more guidance and opinions about Kubernetes probes and their meaning.
Awesome @bclozel ! Some questions though:
startup
? Kubernetes 1.18 is soon out and will have it enabled by default...Thanks for the nice work 馃憤
Hey @matthyx
Snapshot documentation should be up soon (I'll ping you with a link to it), you'll get a chance to have a more complete picture with the docs and guidance. I would be glad to get your feedback on that.
startupProbe is a way to reuse an existing probe
True, but it doesn't necessarily have to...one key point is that it's no longer launched once it has succeeded.
I will try using it as a "smoke test" probe which could trigger a heavier test that only makes sense at application startup - and that's too heavy to run periodically like the liveness one.
@matthyx See https://docs.spring.io/spring-boot/docs/2.3.x-SNAPSHOT/reference/html/spring-boot-features.html#boot-features-kubernetes-application-state
As for the startup sequence, here's how probes should behave:
| Startup phase | LivenessState | ReadinessState |
|-----------------------------------------|---------------|----------------|
| Start the app | broken | busy |
| App. context is starting | broken | busy |
| App. context started OK, web port is UP | live | busy |
| Startup tasks are running | live | busy |
| App is ready for client requests | live | ready |
Looks good! I saw the other part regarding HTTP probes as well, which is good because exec
probes should be avoided when possible.
Last week I saw some regressions in containerd causing zombies created after each probe exec...
Thank you @bclozel and the team to implementing this.
It seems I can replace our current readiness/liveness implementation to this new one.
I took a close look and here is my feedback for current implementation.
Graceful shutdown and readiness health indicator
I think there is a problem in ordering of graceful shutdown and flipping readiness-probe by ContextClosedEvent
.
ServletWebServerApplicationContext#doClose
first perform graceful shutdown(webServer.shutDownGracefully()
), then it calls super.doClose()
which is AbstractApplicationContext#doClose
and it publishes ContextClosedEvent
.
Therefore, while graceful shutdown is in progress, the readiness is still READY
which brings traffic to the pod.
Readiness needs to be BUSY
before graceful shutdown to happen. Ideally, once it became BUSY
, it needs to wait k8s readiness probe frequency duration in order to k8s pickup the latest readiness state, then proceed to graceful shutdown.
Default health indicators for readiness group
It seems by default, liveness/readiness
health group will solely have livenessProbe/readinessProbe
bean based on ProbesHealthEndpointGroupsRegistrar
.
For readiness, I think many usecases are to include all health indicators except livenessProbe
. (or maybe even include livenessProbe
)
To do so, with current implementation, it is required to set these properties.
```
management.endpoint.health.group.readiness.include=*
management.endpoint.health.group.readiness.exclude=livenessProbe
````
What do you think including all but livenessProbe
health indicators for readiness
group by default?
Documentation
Can you include the liveness/readiness state behavior table in above comment to documentation?
It is very informative and intuitive to understand.
Thanks,
Therefore, while graceful shutdown is in progress, the readiness is still READY which brings traffic to the pod.
I don't believe this is the case. Once SIGTERM
has been sent to the process (which is what will trigger the graceful shutdown), the liveness and readiness probes are no longer called and its response becomes irrelevant.
What do you think including all but livenessProbe health indicators for readiness group by default?
We do not think this should be the default behaviour as it's dangerous for readiness checks to include external services. If one of those services is shared by multiple instances of the app and it goes down, every instance will indicate that it is not ready to receive traffic. This may trigger auto-scaling and the creation of more instances which is likely to only make the problem worse.
If you know that an external service is not shared among multiple instances, you can safely opt in by including its indicator in the readiness group.
I don't believe this is the case. Once SIGTERM has been sent to the process (which is what will trigger the graceful shutdown), the liveness and readiness probes are no longer called and its response becomes irrelevant.
In a standard k8s topology you route requests to a pod via the service. Are you suggesting that a SIGTERM triggers the kube-controller and propagates the information to the service not to route traffic to its pods? We tested this at scale 1-1.5 yrs back and I'm pretty certain that we found k8s does not work like this. Perhaps something has changed?
Underneath the hood services are an abstraction on iptables routing rules. These rules are updated by a kube-controller when an event occurs. I don't think SIGTERM would trigger this event.
Please let me know if (or where) I'm mistaken.
I think @wilkinsona was referring to when the controller kills the pod for instance during a rolling update. In that case I'm almost sure the probes aren't called anymore, but I can check the relevant code and update here if you want.
Sure, that makes sense. Please feel free to check. We're just concerned about handling things in a robust way for all scenarios. In that I think you'd need knowledge of the readiness probes. Our current logic for graceful shutdown is this:
- Send NOT_READY event - this signals Kubernetes that it should no longer route connections to the service web container
- Shutdown all known ExecutorServices - includes anything inherited from ExecutorConfigurationSupport in the spring container
- Sleep for
readinessProbeTimeout
duration - waits for the readiness probe to go into aNOT_READY
state in kubernetes- Pause the tomcat connection pool - will cut off any client connections at the TCP layer and allow any task in the Connector pool to complete
- Wait for the remainder of the specified
gracefulShutdownTimeout
for the pools to shutdown- After this spring can continue to shutdown
We keep our shared framework as in-line with spring as possible. More or less - spring sets the direction and we blindly follow. This is to reduce our surface area as much as possible. In that @ttddyy was asking because the changes you are making are going to impact us since we'll align and make it work. Right now we have mechanisms integrated directly into readiness / liveness probes. This has worked extremely well. In this thread I get the feeling that you all are saying this is a bad pattern, so we are trying to understand why that is.
Are you suggesting that a SIGTERM triggers the kube-controller and propagates the information to the service not to route traffic to its pods? We tested this at scale 1-1.5 yrs back and I'm pretty certain that we found k8s does not work like this. Perhaps something has changed?
It's not SIGTERM
that triggers this, but the general shutdown processing that Kubernetes orchestrates. This shutdown processing happens in parallel so there's a window during which traffic will be routed to a pod that has also begun its shutdown processing. This eventual consistency is unfortunate, but my understanding is that the K8S team deem it to be necessary due to the distributed nature of the various components that are involved. The size of the windows is both undefined and unaffected by any probe responses.
To avoid requests being routed to a pod that has already received SIGTERM
and has already begun shutting down, the recommendation is that a sleep should be configured in a pre-stop hook. This sleep should be long enough for new requests to stop being routed to the pod and its duration will vary from deployment to deployment. Times of 5-10 seconds seem to be quite common from what I have seen so that's probably a good starting point. Once the pre-stop hook has completed, SIGTERM
will be sent to the container and graceful shutdown will begin, allowing any remaining in-flight requests to complete.
This blog post describes things quite well, albeit with some slight differences as it's talking about Nginx.
thanks @wilkinsona . My concern is not related to the upgrade flow. Sorry if i wasn't clear about that. I understand that irrespective of any probes that this will work correctly. My concern is the around SIGTERM, more specifically we have flows that require us to restart a container within a pod. For that to work graceful shutdown needs to communicate to k8s (or an ingress controller) to stop sending traffic to it.
A common scenario for us is key rotation. We need to have the ability to rotate secrets on all of our services. In order to achieve this we have mechanisms built in to restart the containers within a pod by killing the jvm (kill
Any thoughts on that?
Thanks for the additional details.
@bclozel has made a change (not yet pushed to master) that publishes the ReadinessStateChangedEvent
prior to graceful shutdown commencing. This will result in the readiness probe indicating it is not ready before graceful shutdown begins. You could listen for this event and when it's received, perform any logic that you want before the graceful shutdown proceeds. You'd probably want to include a mechanism that knows the source of the shutdown so that the logic is performed only when it's necessary.
All that is needed here is to update our k8s secret (or other secret store) and simply restart the jvm.
This sounds a little risky to me. What happens if the restart takes longer than expected and the liveness probe fails because the process is down?
This sounds a little risky to me. What happens if the restart takes longer than expected and the liveness probe fails because the process is down?
We've seen wonky behavior in general, but not from your scenario. Our initialDelaySeconds
was carefully crafted to avoid this. BUT, I have seen issues where the startup just hangs. I don't think this is spring, but there is something funky going on. In that case we simply restart again. The outage of one container is fine as we do this in a rolling fashion and we (should be) n+1 in all areas to avoid it impacting us.
That's great that you have the ReadinessChangedEvent
, all we need is an associated timeout to ensure it doesn't shutdown too early and I think we are good to go!
This is looking super good overall. I think Sprint Boot is getting one of the most mature health handling that exists to date!
I hope you plan to communicate this in articles, workshops and conferences!
Thanks for the discussion and implementation. @wilkinsona @scottmf @bclozel
The updated implementation looks great.
This allows us to support both pod delete and restart.
One last thing is that I think it is also better ReadinessStateChangedEvent
to have cause
.
According to @matthyx about readiness usage for draining queues in comment above, it is possible to use readiness for other than graceful shutdown.
In that case, listener for ReadinessStateChangedEvent
needs to differentiate what caused the event, then it can behave differently. (e.g: shutdown task executor for graceful shutdown v.s. simply finishup all tasks in executor for draining)
In the meantime, I've updated this and now HTTP Probes are activated if the Kubernetes CloudPlatform is detected, or if the management.health.probes.enabled
property is set to true
.
LivenessProbeHealthIndicator
, ReadinessProbeHealthIndicator
and the related Health groups are not enabled by default for all applications. They're generally useful but if we did, here's what would happen:
LivenessProbeHealthIndicator
, ReadinessProbeHealthIndicator
would show up in the default group in /actuator/health
. Many platforms look at this endpoint to figure out if an application is broken or not. During startup time (and especially with ApplicationRunner
tasks), the Readiness probe will report OUT_OF_SERVICE
which could trick platforms into restarting the app indefinitely if the configured timeout is too short. In short, we're adding a new facet to this that will make the upgrade experience not great...We can definitely consider flipping that default in a future release with more breaking changes.
@ttddyy
One last thing is that I think it is also better ReadinessStateChangedEvent to have cause.
According to @matthyx about readiness usage for draining queues in comment above, it is possible to use readiness for other than graceful shutdown.In that case, listener for ReadinessStateChangedEvent needs to differentiate what caused the event, then it can behave differently. (e.g: shutdown task executor for graceful shutdown v.s. simply finishup all tasks in executor for draining)
We've made a few more refinements that will be available in the next release that make it easier to add your own state types and/or events. To get the last event that actually caused the update you can use the getLastChangeEvent
method. For example:
@Component
public class MyComponent {
private final ApplicationAvailability availability;
public MyComponent(ApplicationAvailability availability) {
this.availability = availability;
}
public void someMethod() {
AvailabilityChangeEvent event = this.availability.getLastChangeEvent(ReadinessState.class);
// check the event source or use instanceof if a custom subclass is in use
}
}
Most helpful comment
After a quick discussion with the team, it seems we need to think about this more.
First, we need to consider making that feature available without actuator on the classpath - that would change our previous plan about leveraging health indicator groups if we decide to do so.
We also need to reconsider the actual checks made by each probe.
readiness
probes should tell whether the application is ready to receive traffic. In our case, we should:ApplicationReadyEvent
when the application startsliveness
probes:startup
probes:In all cases, it could be useful to document/point to relevant documentation; each probe needs to be configured with a different spec (failure threshold, period).