Compose: depends_on Should Obey {{ State.Healthcheck.Status }} Before Launching Services

Created on 21 Jul 2016  路  25Comments  路  Source: docker/compose

Is it possible for the depends_on parameter to wait for a service to be in a "Healthy" state before starting services, if a healthcheck exists for the container? At the moment in 1.8.0-rc2 the depends_on parameter only verifies that the container is in a running state, via {{ State.Status.running }}. This would allow for better dependency management in docker-compose.

areconfig kinenhancement

Most helpful comment

Is anyone working on this issue?

All 25 comments

Yes, we should probably do this. It will involve:

  • Adding support for both setting the health check options for a container and querying its health state to docker-py
  • Adding support for those options (command, interval, retries, timeout) to the Compose file, probably under a health_check key
  • Updating the logic of depends_on to query containers' health state before considering a dependency to be ready

There are some important questions to answer, e.g.

  • What qualifies a _service_ (i.e. multiple containers) as healthy? Is it healthy if and only if _all_ its containers are healthy?
  • Should containers not join networks until they are healthy?
  • etc

I like the idea of not joining a network until healthy also, that would allow for auto joining of containers via the service dns record. Is it possible to also have a mechanism where containers would leave the service dns record if the healthcheck is failing? I think this portion might be a question for docker engine.

Yes, that's the problem with implementing it in Compose - its state can't be monitored. I think that suggests we shouldn't look at a container's health before connecting it to the network.

I think that the health state of a service is separate from whether it is connected to the network. A service might need to connect to other services first before it becomes healthy.

The health check should also be done at the service level, not container level. It makes more sense now to think in terms of Services than containers, specially with Docker 1.12 (Service has become a 1st class citizen). A consumer of the service should not have to care that the service is made up of 1,2 or dozens of containers. What counts is that the service as a whole is considered ready to accept requests.
This means that each service would need some centralised way for its containers to report their status to. The DNS resolver is already used to report the IP of each service. It could probably be used to keep the health status as well.

Quoting @dnephin in https://github.com/docker/compose/pull/3815#issuecomment-237610466:

I haven't tried out healthchecks yet. Is there some api call we can use to do "wait on container state == HEALTHY"?

If that exits, I think it would make sense to use it on depends_on. I'd rather not build polling into Compose to handle it.

I agree, so next step here would be to open a PR against Engine to implement that functionality if it doesn't already exist. It's important to get it into Engine first and as early as possible, because we have a policy of supporting at least two versions of Engine. If we can get the API endpoint into Engine 1.13, then we can get healthcheck support into Compose 1.10.

What about using the event system to do that?

You can filter based on event="health_status: healthy"
https://docs.docker.com/engine/reference/api/docker_remote_api_v1.24/#/monitor-docker-s-events

I tweaked service.py and parallel.py and was able to make one container wait for another until it is healthy.

Basically every container, which has dependencies on other containers (as I see it dependencies are inferred from links, volumes_from, depends_on... --> get_dependency_names() method -- line 519 in service.py) will wait until those containers are healthy.

Regarding API, docker-compose uses docker-py and health check can be performed as follows (in service.py):

    def is_healthy(self):
        try:
            status = self.client.inspect_container(self.name)["State"]["Health"]["Status"]
            if status == "healthy":
                return True
            else:
                return False
        except docker.error.NotFound as e:
            return False
        #didn't test --> this bit is necessary for backward compatibility, otherwise it will not work with previous docker API --> need more specific exception class too
        #except BaseException as e:
            #If API does not support healthchecks, just return true
            #return True

Then in parallel.py I just added one more check for firing the producer for the object. Iteration for the pending loop looks like this now:

    for obj in pending:
        deps = get_deps(obj)

        if any(dep in state.failed for dep in deps):
            log.debug('{} has upstream errors - not processing'.format(obj))
            results.put((obj, None, UpstreamError()))
            state.failed.add(obj)
        elif all(
            dep not in objects or dep in state.finished
            for dep in deps
        ) and all(
            #This is a new case checking for healthy status --> is dep always an instance of Service class?
            dep.is_healthy() for dep in deps
        ):
            log.debug('Starting producer thread for {}'.format(obj))
            t = Thread(target=producer, args=(obj, func, results))
            t.daemon = True
            t.start()
            state.started.add(obj)

Can anyone from the maintainers comment on it? It works with up command, down command also no issue. However, it might break something or?

tests/acceptance/cli_test.py::CLITestCase::test_down <- tests/integration/testcases.py

This test case hungs =\

I don't think we need to use the healthcheck for volumes_from. It only requires the container to be running.

I think it would be good to only use healthchecks for depends_on, and leave links with the old method (only waiting on the container to start). That way there is a way to add dependencies without incurring the extra time waiting on healthchecks. Some applications may handle dependencies more gracefully, and we shouldn't make their startup time slower.

Sounds like a good idea @dnephin

Is anyone working on this issue?

Any progress on this?

@Hronom looks like a WIP pull request here: https://github.com/docker/compose/pull/4163

This issue should be re-opened, as it is not working:

version: "3"

services:

  rabbitmq:
    image: rabbitmq:management
    ports:
      - "5672:5672"
      - "15672:15672"
    healthcheck:
      test: ["CMD", "rabbitmqctl", "status"]

  queues:
    image: rabbitmq:management
    depends_on:
      - rabbitmq
    command: >
      bash -c "rabbitmqadmin --host rabbitmq declare queue name=my-queue"

When executing:

docker-compose up queues

this reports as if the RabbitMQ server is not started. The container status of the rabbitmq is starting when I look at it.
After a few seconds the container gets healthy.

@lucasvc the healthchecks will not help you ensure start up order of your system. That is a design decision. System components should be implemented in a way that they can reconnect/retry if something is not up (or died). Otherwise, the system will be prone to cascading failures.

P.S. The issue will not be re-opened, the feature is behaving as expected.

@earthquakesan i must be missing something, the OP states

Is it possible for the depends_on parameter to wait for a service to be in a "Healthy" state before starting services, if a healthcheck exists for the container?

what is the feature doing for us now?

also, to be clear, we're not using this feature to keep our applications waiting on their dependencies. i have a docker-compose.yml defining an app under test, a database, and a "tester" container that tests the app.

the app under test will handle waiting for and reconnecting to the database, that's no problem. the issue is the "tester" container. i'd like for it to just run go test (i'm using golang) once the app under test is ready. and this healthcheck feature seems well-suited.

it seems unnecessary for a tester container, which is just running go test, to also maintain a healthy connection with the app under test. i wish i could be running these tests from the host (read: not inside a container), but strangeness with our current CI system necessitates putting everything into containers.

hope this makes sense.

Please refer to the docs on how to declare health check dependencies:
https://docs.docker.com/compose/compose-file/compose-file-v2/#depends_on

it's unfortunate they pulled the condition feature from depends_on for 3.0. seems the official recommendation is wait-for.

@ags799 I've been following the discussions here and the timeline was as follows:

  1. depends_on was waiting for a docker container to start (docker-compose version 1 and 2)
  2. HEALTHCHECKS and "condition" introduced (docker-compose version 2.1). AFAIK, condition is only supported in 2.1.
  3. rollback to 1. (docker-compose version 3)
  4. Deprecate the feature in the last docker swarm (cite the docs: The depends_on option is ignored when deploying a stack in swarm mode with a version 3 Compose file.)

As the most up-to-date (and the best) way to deploy docker-compose is with docker stack deploy command, the feature is basically deprecated.

Here is a better "wait-for-it" version. The code is licensed under MIT, so feel free to reuse it.

@shin- about

Please refer to the docs on how to declare health check dependencies:
https://docs.docker.com/compose/compose-file/compose-file-v2/#depends_on

Why I should maintain two healthchekcs, when docker provides me a way to know if the service is healthy?
DRY tell me to use docker feature, but between containers I think this is not doable.

I'm not sure what you mean ; there's no need to maintain two healthchecks.

Yes, the wait-for and the healtcheck. Also wait-for is insufficient for many scenarios, EG: PG db init, we use pg_isready which not only tests the port but also that the DB is there.

Waiting for a container to become ready does seem like it should be native functionality. Including a wait-for-it script seems like an acceptable hack in lieu of the feature, but not a refined solution to the problem.

it would need something like kubernetes readinessProbes, but "waitForItProbes" which doesn't really solve the problem, because if container A dies/restarts after signaling "ready" to container B, then container B will anyway crash because A is not available

I suggest closing this issue, because the only way to "do" this, is to have "wait-for-it.sh" or similar setup.

Was this page helpful?
0 / 5 - 0 ratings