So I've been going round and round trying to get my swarm working
Thought I was getting close but now when I docker-compose up I get the following error:
... lots of --verbose logging ...
compose.service._containers_have_diverged: oceania_nginx-conf_1 has diverged: e986cd8a43c1825f73d62af8ac5fd0c467d185f8e06d1cd89394ff11fcad86ed != e48efc42c1e29a8ba5d862593cd23da42004d996959411d34bcc106629144c09
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- (u'oceania_oceania')
compose.network.ensure: Creating network "oceania_oceania" with driver "overlay"
compose.cli.verbose_proxy.proxy_callable: docker create_network <- (ipam=None, driver='overlay', options=None, name=u'oceania_oceania')
ERROR: compose.cli.main.main: 500 Internal Server Error: pool configuration failed because of Unexpected response code: 413 (Value exceeds 524288 byte limit)
seems to be in the step to create the overlay network
docker-machine restart fixed it (I restarted both the master and the consul node in my swarm, not sure which fixed it)
+1
I was able to get past the error by restarting the docker daemon on my 2 swarm nodes. I have a 3 node swarm cluster and have gotten into this situation a couple of times when trying to shutdown all containers (including consul, swarm-agents, swarm-manager, etc) and then restart the whole swarm cluster. I ran into the error when trying to add back the overlay network so I have a feeling this is a docker/swarm issue more than a compose specific issue.
$ docker -H=localhost:3375 network ls
NETWORK ID          NAME                               DRIVER
9007b2cf63d3        m-test/docker_gwbridge          bridge              
3403cc796588        m-test/none                     null                
06a45003eabd        m-test/host                     host                
13c25afaa91d        m-test/bridge                   bridge              
2f45ca35c557        m-test-node01/docker_gwbridge   bridge              
61e79bf4d1e8        m-test-node01/none              null                
16bda52fa5a9        m-test-node02/bridge            bridge              
31ab4d2ffb70        m-test-node01/bridge            bridge              
b2d6e694577f        m-test-node01/host              host                
bee6c146e4f2        m-test-node02/docker_gwbridge   bridge              
bbd020f7f203        m-test-node02/none              null                
980e47bf7ea5        m-test-node02/host              host                
$ docker -H=localhost:3375 network create -d overlay frontend
Error response from daemon: 500 Internal Server Error: pool configuration failed because of Unexpected response code: 413 (Value exceeds 524288 byte limit)
                    I agree, this is a swarm issue.
I experience this without swarm - using overlay networks.
I can also confirm restarting fixes this for 1.2.0 and 1.11.0.
鈿狅笍 Use at your own risk! May take a while and you'll have downtimes of you nodes 鈿狅笍
docker-machine ls -q | xargs -I {} docker-machine ssh {} sudo /etc/init.d/docker restart
                    We're also hitting this. After digging into it a bit, it looks like the response code 413 (entity too large) is generated from Consul, when Docker tries to put a large key into the store, here
Specifically this line in Consul: https://github.com/hashicorp/consul/blob/6f0a3b9bf5c7fd0673213d451bbc9e66f7a9cad9/command/agent/kvs_endpoint.go#L188
Consul treats requests greater than 512*1024 bytes as an error case, as possible abuse of the store.
After restarting the docker daemons, the network can be created fine. So, it would appear that after a restart the requests to the store for creating the overlay network are a lot smaller.
So possibly the overlay network state is growing during use, and restarting cleans it in some way. Or it is getting into some bad state and trying to put bad data into the store. I haven't found yet a reproducible set of actions that result in this happening, but when it does happen the only solution seems to be to restart the docker daemon at the moment, which is not really an option for our production cluster.
In summary, it looks to me like:
We are using docker 1.12 with old swarm (not swarm mode currently).
Most helpful comment
We're also hitting this. After digging into it a bit, it looks like the response code 413 (entity too large) is generated from Consul, when Docker tries to put a large key into the store, here
Specifically this line in Consul: https://github.com/hashicorp/consul/blob/6f0a3b9bf5c7fd0673213d451bbc9e66f7a9cad9/command/agent/kvs_endpoint.go#L188
Consul treats requests greater than 512*1024 bytes as an error case, as possible abuse of the store.
After restarting the docker daemons, the network can be created fine. So, it would appear that after a restart the requests to the store for creating the overlay network are a lot smaller.
So possibly the overlay network state is growing during use, and restarting cleans it in some way. Or it is getting into some bad state and trying to put bad data into the store. I haven't found yet a reproducible set of actions that result in this happening, but when it does happen the only solution seems to be to restart the docker daemon at the moment, which is not really an option for our production cluster.
In summary, it looks to me like:
We are using docker 1.12 with old swarm (not swarm mode currently).