Spring-boot: Kubernetes readiness probe endpoint returning 404

Created on 25 Jul 2020  路  16Comments  路  Source: spring-projects/spring-boot

There appears to be some change in behaviour for the Kubernetes-oriented readiness group endpoint on 2.3.2 compared to 2.3.1.

For a service that has no external dependencies (and only readinessState in the health group), the /actuator/health/readiness endpoint is returning a 404.

Configuration we are using:

management.server.port=9083
management.health.probes.enabled=true
management.endpoints.enabled-by-default=false
management.endpoint.info.enabled=true
management.endpoint.health.enabled=true
management.endpoint.health.show-details=always
management.endpoint.health.group.liveness.include=livenessState,diskSpace,refreshScope
management.endpoint.health.group.readiness.include=readinessState
management.endpoint.health.group.liveness.show-details=always
management.endpoint.health.group.readiness.show-details=always
management.endpoints.web.exposure.include=health

Expected Behaviour
We expect this to just return 200 with { "status": "UP" }

Actual Behaviour

$ http http://localhost:9083/actuator/health/readiness
HTTP/1.1 404 Not Found

Full health call:

$ http http://localhost:9083/actuator/health
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: application/json
Date: Sat, 25 Jul 2020 06:27:55 GMT
Transfer-Encoding: chunked
{
    "components": {
        "discoveryComposite": {
            "components": {
                "discoveryClient": {
                    "description": "Discovery Client not initialized",
                    "status": "UNKNOWN"
                }
            },
            "description": "Discovery Client not initialized",
            "status": "UNKNOWN"
        },
        "diskSpace": {
            "details": {
                "exists": true,
                "free": 287311962112,
                "threshold": 10485760,
                "total": 499963174912
            },
            "status": "UP"
        },
        "livenessStateProbeIndicator": {
            "status": "UP"
        },
        "ping": {
            "status": "UP"
        },
        "reactiveDiscoveryClients": {
            "components": {
                "Simple Reactive Discovery Client": {
                    "description": "Discovery Client not initialized",
                    "status": "UNKNOWN"
                }
            },
            "description": "Discovery Client not initialized",
            "status": "UNKNOWN"
        },
        "readinessStateProbeIndicator": {
            "status": "UP"
        },
        "refreshScope": {
            "status": "UP"
        }
    },
    "groups": [
        "liveness",
        "readiness"
    ],
    "status": "UP"
}

This may relate to #22107.

regression

Most helpful comment

This issue is now fixed in the 2.3.3 and 2.4.0 SNAPSHOTs.

I've carefully read the comments on this issue regarding the following surprising behavior: getting a 404 status on a configured health group, when no indicator is present. In this very case it's arguably wrong, but we're in a case of a regression. But some of you thought that

  1. a missing indicator in a group should fail the application at startup or
  2. that an empty group should disappear from the list of groups on the main endpoint.

The first alternative sounds nice, especially for detecting bad configurations. But it's also likely to fail in perfectly valid cases. Your application could configure a group management.endpoint.health.group.custom.include=ping,redis and fail in a test environment where no redis instance is available. Because Spring Boot reacts to the environment, it's expected to behave differently and adapt to the situation.

The second alternative is debatable. Right now our health groups support is auto-configured with the configuration properties and does not look into the application context to check for the existence of health indicators. We seem to all agree that a 404 response status is right in this case. Removing the group information would, in my opinion, make things less consistent as we wouldn't know that a group has been configured. After all, a health group is just a way to wrap several indicators under the same name and customize its global health status - but health indicators are still dynamic.

After discussing that briefly with the team, we didn't think that this needs to be changed. Note that this behavior exists since the introduction of the health groups feature. If you can make a stronger case for changing this, please create a dedicated issue and explain how this behavior is inconsistent or could lead to issues.

Thanks!

All 16 comments

After a bit more digging, I'm not really sure why or whether it was intended, however the issue seems to be that readinessState has become readinessStateProbeIndicator (and same for livenessState) so the old configuration was not correctly including the indicator at all, leaving the readiness group empty.

This seems to work as expected.

management.endpoint.health.group.liveness.include=livenessStateProbeIndicator,diskSpace,refreshScope
management.endpoint.health.group.readiness.include=readinessStateProbeIndicator

Yes this is an unintended side effect of #22107. The workaround you're mentioning is the right one in the meantime.

Thanks for raising this issue!

No problem - feel free to re-title it as appropriate.

Unfortunately this is a transparently breaking change for many people, they probably won't realise the probe status isn't being included in the status in addition to, say, db, redis etc because including a non-existent indicator in a group doesn't seem to fail startup :(

I've tagged this issue as a regression.

I'm really sorry for letting in that one.

Does this cover the fact that they are listed under groups at /health, but then don't actually exist?

I precisely have the same issue than @OrangeDog . On my container with management.endpoint.health.probes.enabled=true:

  • When executing GET /actuator/health:
    { "status": "UP", "groups": [ "liveness", "readiness" ] }

  • When executing GET /actuator/health/liveness:
    404 Not Found

* When executing GET `/actuator/health`:
  `{ "status": "UP", "groups": [ "liveness", "readiness" ] }`

* When executing GET `/actuator/health/liveness`:
  `404 Not Found`

I agree this is potentially confusing, but doesn't seem to be the main problem here?

I wonder whether the /actuator/health endpoint behaved differently under 2.3.1 if a group has no configured components? i.e it filtered them out from groups: []?

I guess this is a matter of design - the group exists but has no (valid) components, therefore its status is indeterminate, therefore the implementation returns a 404? It certainly can't return 200 OK....

Would we

  • want to be aware the groups exist, so we know we can add components to them with include ?
  • or have them disappear from the top level endpoint so we don't even know they are there?

Instead of referencing readinessStateProbeIndicator and livenessStateProbeIndicator, I think you need to set management.health.livenessstate.enabled and management.health.readinessstate.enabled properties introduced by spring-boot 2.3.2. So that, you could use readinessState and livenessState reference.

When management.health.[readiness|livenessstate].enabled properties are set to false(by default), AvailabilityProbesAutoConfiguration creates readinessStateProbeIndicator and livenessStateProbeIndicator beans which need to be referenced as [readiness|liveness]StateProbeIndicator(full bean name).

On the other hand, when properties are enabled, AvailabilityHealthContributorAutoConfiguration creates [readiness|liveness]StateHealthIndicator beans which can be referenced as [readiness|liveness]State.

The problem is in AvailabilityProbesHealthEndpointGroups created by AvailabilityProbesHealthEndpointGroupsPostProcessor, this creates readiness/liveness groups with [readiness|liveness]State.
So, if [readiness|liveness]State are not available, groups are created but referenced HealthIndicator beans are not there.

want to be aware the groups exist, so we know we can add components to them with include ?

The API response is supposed to be for consumers of the API, not documenting configuration options for the developer. Like the rest of the actuator system, only endpoints that are currently available should be listed as available.

When management.health.[readiness|livenessstate].enabled properties are set to false(by default)

FYI surprisingly enough Spring Boot decided to name the readiness state property management.health.readynessstate.enabled with a y in the 2.3.2.RELEASE version (most recent release at this date).


See the reference: https://docs.spring.io/spring-boot/docs/2.3.2.RELEASE/reference/html/appendix-application-properties.html#actuator-properties

@antoinegrappin no, that's just a documentation error. The property is readiness.

@OrangeDog indeed, I confirm after tests.

This issue is now fixed in the 2.3.3 and 2.4.0 SNAPSHOTs.

I've carefully read the comments on this issue regarding the following surprising behavior: getting a 404 status on a configured health group, when no indicator is present. In this very case it's arguably wrong, but we're in a case of a regression. But some of you thought that

  1. a missing indicator in a group should fail the application at startup or
  2. that an empty group should disappear from the list of groups on the main endpoint.

The first alternative sounds nice, especially for detecting bad configurations. But it's also likely to fail in perfectly valid cases. Your application could configure a group management.endpoint.health.group.custom.include=ping,redis and fail in a test environment where no redis instance is available. Because Spring Boot reacts to the environment, it's expected to behave differently and adapt to the situation.

The second alternative is debatable. Right now our health groups support is auto-configured with the configuration properties and does not look into the application context to check for the existence of health indicators. We seem to all agree that a 404 response status is right in this case. Removing the group information would, in my opinion, make things less consistent as we wouldn't know that a group has been configured. After all, a health group is just a way to wrap several indicators under the same name and customize its global health status - but health indicators are still dynamic.

After discussing that briefly with the team, we didn't think that this needs to be changed. Note that this behavior exists since the introduction of the health groups feature. If you can make a stronger case for changing this, please create a dedicated issue and explain how this behavior is inconsistent or could lead to issues.

Thanks!

Thanks @bclozel - fix is working fine in 2.3.3 after removing the workaround to the probe names I mentioned above :-)

@chadlwilson can you share your configurations in 2.3.3? I am finding the same issue there..

@salaboy If your application runs on kubernetes, you don't need any specific configuration.
If it doesn't, you need to enable the probes with the following:

management.endpoint.health.probes.enabled=true 
Was this page helpful?
0 / 5 - 0 ratings