Machine: "subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: operation not supported"

Created on 6 Jan 2016 · 22Comments · Source: docker/machine

I've run into a problem running docker-compose on a swarm which was setup using the "Get started with multi-host networking" article on docker.com , except using the Generic Driver

docker-machine create --driver generic --generic-ip-address $HOST1 --generic-ssh-user root --swarm --swarm-discovery="consul://$(docker-machine ip consul):8500" --engine-opt="cluster-store=consul://$(docker-machine ip consul):8500" --engine-opt="cluster-advertise=eth0:2376" node-b

Everything seems fine for each of the hosts

Configuring swarm...
Checking connection to Docker...
Docker is up and running!

However when I run this command (Using only docker hub listed containers )

docker-compose --x-networking --x-network-driver=overlay up -d

ERROR: Cannot start container 4f55c34c5687bc810aaafd58f22d0a60a118d353bc4209993881265e25d171a8: subnet sandbox join failed for "10.0.0.0/24": error creating vxlan interface: operation not supported

drivegeneric

Source

orbitalmedia

Most helpful comment

I've been struggling with Docker Swarm in Linode for about 2 days, so, here are my instructions on how to solve it for anyone else that arrives here.

Here are the instructions with screenshots, because I think it's quite easy to get lost in the procedure and I hope others can avoid all the struggle.

You start with a standard Linode, click on the "edit" link in the "Linode Profile":

selection_001

In the settings, there is a dropdown to select the kernel, by default it has a Linode kernel (that's what causes the problem with Docker Swarm):

selection_002

Select the kernel GRUB 2:

selection_003

Save the changes:

selection_004

Your new profile will say (GRUB 2) at the end. You can now re-start your Linode. There's no need to install anything, re-deploy, etc:

selection_005

After rebooting, it should work.

tiangolo on 4 Nov 2017

👍9 ❤7 🎉5

All 22 comments

What version of the kernel have you got on the host?
see https://github.com/docker/docker/issues/14145

dgageot on 6 Jan 2016

4.1.5-x86_64-linode61

orbitalmedia on 7 Jan 2016

Same here, but I'm not using compose, just simply Docker and the overlay network driver. Kernel:
Linux vagrant-ubuntu-trusty-64 4.2.0-23-generic #28~14.04.1-Ubuntu SMP Thu Dec 31 13:40:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

After restarting the VM, things work.

matevarga on 13 Jan 2016

And same here on Linode.. it is a kernel issue. Even Linode tells you that the box runs 4.x, if you install the official kernel (>=3.16) AND set the linode up to boot from GRUB2, then it will work.

matevarga on 15 Jan 2016

👍5

Closing. This is a kernel issue

dgageot on 12 Feb 2016

Sorry guys, I have the same issue here. but I don't understand what is the reason. could somebody tell a bit more details?

gabrielhao on 24 Feb 2016

You probably have a kernel that's too old or it's not supported.

matevarga on 24 Feb 2016

well this how i get confused. the kernel version is not old, but i still get this error.
the kernel version is: 4.4.0-x86_64-linode63, OS is Ubuntu 14.04.
and the docker info:

Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 37
Server Version: 1.10.2
Storage Driver: devicemapper
 Pool Name: docker-8:0-65539-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.264 GB
 Data Space Total: 107.4 GB
 Data Space Available: 21.93 GB
 Metadata Space Used: 3.138 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.144 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: overlay bridge null host
Kernel Version: 4.4.0-x86_64-linode63
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 991.7 MiB
Name: localhost
ID: S4LH:BUBO:ZCVX:QDGD:UWAI:GIFD:DL2I:KGP2:HDDL:WLQ3:WPO4:ZB34
WARNING: No swap limit support
Cluster store: consul://xxx.xxx.xxx:8500/network
Cluster advertise: xxx.xxx.xxx:2375

gabrielhao on 24 Feb 2016

Yep, Linode's 4.4 kernel has this problem. Install a signed kernel (like lts-wily) from the Ubuntu repo, make sure that Linode actually boots that kernel (somewhere in the VM settings), then you're good to go.

apt-get install -y linux-signed-generic-lts-wily

Dashboard -> Edit profile -> Kernel -> Grub2

matevarga on 24 Feb 2016

👍5 🎉3

Thanks a lot. This really solved the problem. :)
but pls allow me to ask, what is the reason behind this? Linode's 4.4 kernel has something different than signed kernel which could cause this problem?
I've read some issues in github, normally this is caused by old kernel < 3.16, but why 4.4 still has this.

gabrielhao on 25 Feb 2016

I don't know, unfortunately.

matevarga on 25 Feb 2016

Apologies if it's a silly question, but I also have this problem. What entries are needed in /etc/apt/sources.list for the apt-get install to work, as I haven't had any luck getting it to work with the things I've tried.

PhilLogan on 4 Mar 2016

never mind - found out about trusty/trusty-updates repositories

PhilLogan on 7 Mar 2016

To those who may experience this issue even with recent kernels.
We had the same problem with the kernels given by OVH (with all the right drivers embedded) that dont have the CONFIG_VXLAN set.
So check out the config if you have it and recompile the kernel while making sure that CONFIG_VXLAN and CONFIG_VETH are enabled either as embedded or as a module.

WydD on 12 Jul 2016

👍8

Thanks @matevarga. Same here today on Linode with CentOS 7. I didn't have to install the kernel myself though, just rebuilt the machines and set them to GRUB 2 in Linode's panel before first starting them.

zmirc on 22 Oct 2017

I've been struggling with Docker Swarm in Linode for about 2 days, so, here are my instructions on how to solve it for anyone else that arrives here.

Here are the instructions with screenshots, because I think it's quite easy to get lost in the procedure and I hope others can avoid all the struggle.

You start with a standard Linode, click on the "edit" link in the "Linode Profile":

selection_001

In the settings, there is a dropdown to select the kernel, by default it has a Linode kernel (that's what causes the problem with Docker Swarm):

selection_002

Select the kernel GRUB 2:

selection_003

Save the changes:

selection_004

Your new profile will say (GRUB 2) at the end. You can now re-start your Linode. There's no need to install anything, re-deploy, etc:

selection_005

After rebooting, it should work.

tiangolo on 4 Nov 2017

👍9 ❤7 🎉5

Thanks for the find @tiangolo Worked perfectly

johnhidey on 6 Dec 2017

👍1

Thanks @tiangolo worked perfectly for me

easyguyme on 14 Dec 2017

👍1

In case anyone follows @tiangolo 's instructions on an older Linode and is left staring with at the Grub prompt fromm Lish, see: https://www.linode.com/docs/tools-reference/custom-kernels-distros/run-a-distribution-supplied-kernel/#older-distributions

bard on 15 Dec 2017

❤1 👍1

I am facing the https://www.linode.com/docs/platform/manager/how-to-change-your-linodes-kernel/#no-upstream-kernel-installed problem :-(

Ubuntu Server 16.04 LTS

sudo apt update
apt list -a linux-image-generic
sudo apt install linux-image-generic grub2
ls /boot
sudo vim /etc/default/grub
sudo update-grub

GRUB_DEFAULT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="console=ttyS0,19200n8 net.ifnames=0"

GRUB_TERMINAL=serial
GRUB_DISABLE_OS_PROBER=true
GRUB_SERIAL_COMMAND="serial --speed=19200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_DISABLE_LINUX_UUID=true
GRUB_GFXPAYLOAD_LINUX=text

References:
https://askubuntu.com/questions/879888/how-do-i-update-kernel-to-the-latest-mainline-version
https://askubuntu.com/questions/119080/how-to-update-kernel-to-the-latest-mainline-version-without-any-distro-upgrade

zx1986 on 6 Sep 2018

Just noting that 10.0.0.0/24 is an invalid subnet. The first valid subnet within the 10.0.0.0/8 (Class A) network, now sliced with a /24 subnet mask is... 10.0.1.0/24. You have to throw away the top/bottom on the network side just like you do for the top/bottom for the host side of that bitmask. For the same reason, 10.255.255.0/24 is also invalid.

For any given subnet mask there are 2^x - 2 subnets and 2^x - 2 hosts

...where x is the number of bits on that side of the mask. So for /24 that's 24 on the network side and 8 on the host side making 16777214 subnets and 254 hosts. Note the "- 2" part of that calculation on the network side of the bitmask. That means that you have to throw away (you can't issue) those since they mean something to the transport layer of tcp/ip, in this case.

This should make sense to anyone who already knows that you similarly can't bind any 10.x.y.0/24 and 10.x.y.255/24 addresses since they already mean something.