Compose: The error "ERROR: network network_main_net id ... has active endpoints" isn't particularly useful and blocks `docker-compose down`

Created on 20 Jul 2018  路  16Comments  路  Source: docker/compose

Description of the issue

The error "ERROR: network network_main_net id ... has active endpoints" isn't particularly useful and blocks docker-compose down.

Details

I have multiple docker-compose.yml files that look like this:


version: '2.3'
services:
  someservice:
    networks:
    - networks_main_net
    ...

networks:
  network_main_net:
    name: network_main_net

So far, this is the only way to have a shared network over multiple docker-compose.yml groups which doesn't involve some hackery with a special network-owning folder that needs to be uped first.

However, for all of those containers, docker-compose down && docker-compose up -d is broken:

# docker-compose down && docker-compose up -d
Stopping mail ... done
Removing mail ... done
Removing network network_main_net
ERROR: network network_main_net id 6aaf7ec207e435452261aa3f408813c1197d5a5080288d70e4886373427c9fe7 has active endpoints

Since the error isn't actually useful to me and certainly not something I would want it to abort over (of course there are still active endpoints - that's the point of sharing the network!) I suggest that it is converted into a warning.

Context information (for bug reports)

# docker-compose version
docker-compose version 1.21.2, build a133471
docker-py version: 3.3.0
CPython version: 3.6.5
OpenSSL version: OpenSSL 1.0.1t  3 May 2016
# docker version
Client:
 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.10.1
 Git commit:    7390fc6
 Built: Wed Apr 18 01:23:11 2018
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   7390fc6
  Built:    Wed Feb 28 17:46:05 2018
  OS/Arch:  linux/amd64
  Experimental: false

Steps to reproduce the issue

  1. Use shared network alike to networks.some_name.name: some_name (used by a container as services.myservice.networks: ["some_name"] as shown above)
  2. Set up 2+ docker-compose.yml groups that all share this same network name
  3. Do docker-compose up -d for all the groups sharing this network
  4. Do docker-compose down on one of them

Observed result

error message that isn't particularly surprising or relevant, and exit code non-zero

Expected result

warning (or no network-related output at all) and exit code 0

Stacktrace / full error message

Removing network network_main_net
ERROR: network network_main_net id 6aaf7ec207e435452261aa3f408813c1197d5a5080288d70e4886373427c9fe7 has active endpoints

Additional information

Ubuntu 18.04 LTS

kinquestion

Most helpful comment

which can be slightly inconvenient

This to me is the core of the problem. Breaking fully automated startup is not a slight inconvenience, but a major problem. I shouldn't need a custom shell script to hard code dependencies to make stuff create & launch properly, this is what docker & docker-compose are already for. It's their entire purpose not to require me to do this to get my deployment to work!

Edit: at least that's my opinion on things, hopefully should explain to you why I keep bringing up those tickets

All 16 comments

Hi @JonasT

Thanks for the feedback ; however, this is working as intended. If you don't intend for this network to be managed (i.e. for Compose to attempt removal when using down, and error out when it's unable to do so), you should declare it as external. Note that the use-case you mention is possible but not officially supported, and we've actually recommended against doing this in the past.

Moreover, if all you're only looking to do is to stop / remove containers, docker-compose stop && docker-compose rm is a more appropriate command.

HTH!

@shin- marking as external doesn't work because then the network won't be created if it's not there, which leads to fatal errors on deployment (see #4179 ). Can you share a docker-compose.yml example that actually provides this functionality, where docker-compose down still works? Because I don't know how to write one, which is the reason I am suggesting this change.

If this use case isn't properly supported right now and you don't want to address this by removing this error, I suggest that #4179 really should be reopened.

Does what I write make sense? Basically I'm stuck between "docker-compose down doesn't work" and "need special startup script for strict order or nothing will launch after container cleanup" and both kind of suck...

Edit: "we've actually recommended against doing this in the past" -> so what do you actually recommend to put multiple docker-compose.yml container groups into the same network? I am not aware of any other approach, but I'd be very interested in any solution to this.

The problem you're running into is quite specific to how you're using compose.

A compose file describes the objects/resources that are part of your compose project (stack); the objects that have to be created when deploying your project, and have to be _removed_ when deleting. To make this work, all objects/resources are namespaced by default (prefixed with projectname_) to prevent conflicts (ie preventing two stacks defining a service, network, or volume with the same name to be managing the same objects).

In your case, you're overriding those defaults, and explicitly define the same network to be created in multiple stacks, effectively creating a conflict situation; both stacks now "own" that network, and both will (attempt to) manage it; _create_ the network when deploying, and _remove_ the network when deleting the stack.

Deleting is where the error occurs; when deleting the stack, you're requesting compose to remove all objects/resources that are managed/owned by a project, and due to the conflict, removing "project A" fails if "project B" is still running, and vice-versa.

_Should_ it produce an error? _Yes_; because it failed to perform the actions you requested it to do; it failed to remove one or more objects.

Note that the conflict situation is actually there as well when _deploying_, but goes undetected; services for two separate projects get attached to the same network, and now are able to communicate freely; while this may be desirable in your situation, it may be very undesirable in other situations (for example, a "staging" instance of the stack connecting to the "production" database).

If you want a resource to be shared between projects, the _correct_ approach is to mark a resource as "external"; doing so, you explicitly indicate that the project is _given access_ to a resource, but doesn't own/manage it. External resources have to be created before deploying the stack (which can be slightly inconvenient), but at the advantage that responsibility for creating, and managing, that resource is done in a single place (no multiple stacks should attempt to manage the same resource).

You're just reiterating the limitations of the concept... your "advantage"/"correct approach" simply means that start order suddenly matters in a frustratingly opaque way that breaks deployments if the administrator is not deeply aware of hidden dependencies.

What I did was simply use the best solution to this I'm aware of - suggested to me, by the way, I didn't make it up myself - to try to define a shared network properly. I made this ticket because it doesn't work nicely. That should change.

The whole concept of "one folder needs to own the resource" is simply bad in some scenarios. It may be good for some use cases, but it simply isn't for others. Is that so hard to understand? Why can't docker-compose support those other use cases better as well? Why does the basic need for this need to be discussed over an over, when it's obvious that those use cases aren't well supported, and people have them?

It's just frustrating to constantly get back into concept discussions. The use case is super simple, why people need it is also super simple. I suggested one way, there are also others of course. Quite frankly, I don't care how this is done, this was just one way this could be improved since I don't see much other ideas floating around, just constant responses of "but we can't do that either". (Sorry, I don't mean to attack personally - just explaining why I'm getting frustrated over here!)

Edit: also see #4179 which was closed by the way suggesting this works fine. It doesn't. Hence this ticket here

which can be slightly inconvenient

This to me is the core of the problem. Breaking fully automated startup is not a slight inconvenience, but a major problem. I shouldn't need a custom shell script to hard code dependencies to make stuff create & launch properly, this is what docker & docker-compose are already for. It's their entire purpose not to require me to do this to get my deployment to work!

Edit: at least that's my opinion on things, hopefully should explain to you why I keep bringing up those tickets

@JonasT I definitely understand that some use cases call for several independent Compose files sharing a single network. It's obviously not the use case Compose is maximally designed to support, but we agree that it should work at least to a reasonable degree. That said, there's definitely a few things we seem to be missing about your use-case, namely

  1. If you don't intend to remove networks with docker-compose down, why not simply use docker-compose rm --force --stop ? It's the exact same thing except it ignores networks, which seems to be exactly what you want to begin with.
  2. What is so odious about having some amount of scripting to supplement docker-compose? This is a recurring theme for issues on this tracker, but anything slightly complex will need it at one point or another anyway. And in your case, all you really need is a one liner: docker network inspect network_main_net || docker network create network_main_net.

Any of these two feel more reasonable to me than to expect Compose to not report legitimate errors, i.e. ignoring a failure to remove a network when running a command specifically designed to remove networks. If there are legitimate issues with the solutions I proposed, I'm happy to iterate more on this.

why not simply use docker-compose rm --force --stop ?

I know 3 servers of 3 different admins (excluding mine) who use docker-compose for everything. All of them use such a shared network. All of them use startup order hacks to work around this, and all of those admins use docker-compose down && docker-compose up -d as an idiom. Of course they could relearn it, but are they going to? Or will they all just stick to their startup script hacks? You have to keep in mind unless someone reads this ticket in depth, people won't even understand why this is an issue in the first place.

but anything slightly complex will need it at one point or another anyway

I can again only speak for myself and the three servers I am vaguely involved in run by others. None of them have any custom startup scripting for anything except for this issue. The usual setup I'm seeing is a central folder like /srv and every sub folder is a service, and a boot script simply does docker-compose up -d in each folder - and every single one of those setups has a custom hack for the shared network problem, without which it'd be an absolutely trivial start process. Maybe it's just my environment, but I do think this issue is much more prevalent than you think in more complex real world setups - or at least that's the only impression I can get from my perspective.

(For some more technical detail, most of those setups I've seen use jwilder's docker-gen container - for that, it's simply a basic underlying concept that different docker-compose groups are independent, yet can still be centrally addressed by the proxy - for which you need a shared network to all of them. There is simply no other way of setting this up I'm aware of.)

Any of these two feel more reasonable to me than to expect Compose to not report legitimate errors, i.e. ignoring a failure to remove a network when running a command specifically designed to remove networks.

Then why not add a simple shared flag? Then docker-compose would know it doesn't need to get rid of it. The main issue remains that all of the proposed solution are ugly hacks around the underlying problem that docker-compose doesn't have a proper notion of a shared network - which is what #4179 suggested to add.

Just to add some more detail on the unfortunate human politics side of dc down && dc up -d:

One of those setups I'm vaguely involved in is a smaller non-profit association with lots of independent groups of people in computer science areas, and multiple people helping out with the server voluntarily. This works fine until you add in really odd changes like "we've changed things so docker-compose down doesn't work, to improve the startup script a little" - nobody understands this, so this is simply not going to fly. It's just such an impractical reason too, in the end a startup script hack is less pain for all the assisting voluntary admins that don't need to touch it in their daily work. But that is why it all circles back to #4179 not being solved, and this error message being a problem - unless you don't want to support this use case at all, and force people to continue with hackish startup scripts.

I'm not sure what to say ; there is a solution, but your answer seems to be "I don't want that solution, it forces people I work with to learn a new thing, I want you to break the thing we use currently so that it works for our setup." I want to engage with this honestly, but it seems to me you've placed this artificial roadblock of "human politics" that's probably not that hard to solve. If we can get monkeys to learn sign language, getting a handful of educated individuals to learn a new subcommand should be doable. Or, you know, just write a script that does the thing for them.

The main issue remains that all of the proposed solution are ugly hacks around the underlying problem that docker-compose doesn't have a proper notion of a shared network

The concept of a shared network exists, but it involves using the external keyword and recognizing that one part of the stack has to own that resource. In the typical example you cite, that's probably the Compose file that declares the proxy, since it's the one that always needs to be present for the system to function. It makes sense in practice too ; you won't be able to docker-compose up anything else until the proxy is up and running (creating the network), and you'll be able to docker-compose down any part of the stack without getting error messages. That's not hackish, it's just good ops hygiene.

your answer seems to be "I don't want that solution, it forces people I work with to learn a new thing

Not really, and it seems to me you missed the point of most of what I wanted to express, but given where we're at I don't think we're getting anywhere, so I'll stop.

Edit 2: edited for brevity.

Edit: a better suggestion to fix the design, to leave this discussion on a more constructive note:

# cat docker-compose.yml
version: '2.3'
services:
  someservice:
    networks:
    - network_main_net
    image: ubuntu

networks:
  network_main_net:
    name: network_main_net
    shared: true        # or maybe "global" would be a nice name for this option
# docker-compose down
Stopping someservice ... done
Removing someservice ... done
Removing network network_main_net
INFO: not deleting network network_main_net id 6aaf7ec207e435452261aa3f408813c1197d5a5080288d70e4886373427c9fe7 since its a shared resource with active endpoints
# echo $?
0

Feel free not to respond / not to do anything with it. But maybe it'll give you some ideas to ponder, and revisit this some day if you feel like it.

An implementation for the 2.4 schema: https://github.com/JonasT/compose/commit/f2be29f2c859367ee1935ffe56ba143044a84249 Consider the change CC0/public domain. I'm aware the hard-coded docker.errors.APIError .explanation string check is ugly, there is probably a nicer way to do that

Edit: by the way, this patch shouldn't "break" any regular deployments since it adds an option / opt-in. all it does is provide something that works _without_ nasty caveats (seen from an administrator side, admittedly)

I've spent some more thoughts on this and how to avoid the hard-coded error check above. However, from the discussion above, I'm not sure whether there's any interest in having me attempt to improve the situation. If there is, I'd be happy to share more ideas :+1:

For what it's worth, the underlying issue certainly hasn't magically disappeared for me, or for any of the other admins I know have hackish workarounds in place to avoid it. Wish it had :cry: lol

Edit: I could also explain, if that is desired, why regarding this issue, I think most in @shin- 's last response isn't very helpful to me - however, I wasn't sure it would get us anywhere. But if someone wants to know, I have gained some more distance to this topic and I think I could write it down in hopefully an objective enough manner

TL;DR: this is bugging me enough that I'm willing to fix it (or discuss it further, but only if there is legitimate interest), if someone is interested in letting me

Is there really no interest in improving this even if I'm offering to provide a pull request for it? (not the removal of the error message of course which you reasonably pointed out is hackish, but some sane solution)

@JonasT I definitely understand that some use cases call for several independent Compose files sharing a single network. It's obviously not the use case Compose is maximally designed to support, but we agree that it should work at least to a reasonable degree. That said, there's definitely a few things we seem to be missing about your use-case, namely

1. If you don't intend to remove networks with `docker-compose down`, why not simply use `docker-compose rm --force --stop` ? It's the exact same thing except it ignores networks, which seems to be exactly what you want to begin with.

2. What is so odious about having some amount of scripting to supplement `docker-compose`? This is a recurring theme for issues on this tracker, but anything slightly complex will need it at one point or another anyway. And in your case, all you really need is a one liner: `docker network inspect network_main_net || docker network create network_main_net`.

Any of these two feel more reasonable to me than to expect Compose to not report legitimate errors, i.e. ignoring a failure to remove a network when running a command specifically designed to remove networks. If there are legitimate issues with the solutions I proposed, I'm happy to iterate more on this.

Both of you suggestions aren't real solutions of the problem. The first one doesn't remove the networks anymore, but that is a nice feature which nobody want to lose. Your second suggestion is a solution, but no a docker-compose like solution. If I need to create my own networks then I can even create my docker-containers myself ...

Just to re-explain my reasoning, since this problem hasn't gone away:

I'm not sure what to say ; there is a solution, but your answer seems to be "I don't want that solution, it forces people I work with to learn a new thing, I want you to break the thing we use currently so that it works for our setup." I want to engage with this honestly, but it seems to me you've placed this artificial roadblock of "human politics" that's probably not that hard to solve.

No, my argument is that the solution is a really bad fit for anyone I have asked. And the "artificial roadblock" is I can't make anybody reasonably use it when they consider it really bad. I was trying to explain this nicely, sorry if that just made it harder to follow.

The concept of a shared network exists, but it involves using the external keyword and recognizing that one part of the stack has to own that resource

The ownership is not the problem, but the implicit startup order this suddenly requires which is 1. not how docker-compose usually ever works, 2. leading to random failures when just attempting to start all in any order or alphabetically which is how many people launch things, 3. easily solvable if docker-compose just understood shared networks properly. If this ownership as it is now means that suddenly startup becomes brittle due to an introduction of order causing anyone I have seen try this workaround cause notable issues, then maybe it is worth fixing...? (And the fix IMHO really wouldn't mean this ownership concept is removed, just that docker-compose understands the impacts on startup order. After all, depends_on doesn't magically remove the concept of separate containers either)

I am still very interested in implementing a fix if somebody just lets me.

Was this page helpful?
0 / 5 - 0 ratings