kubeadm version:
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3", GitCommit:"0480917b552be33e2dba47386e51decb1a211df6", GitTreeState:"clean", BuildDate:"2017-05-10T15:38:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.3", GitCommit:"0480917b552be33e2dba47386e51decb1a211df6", GitTreeState:"clean", BuildDate:"2017-05-10T15:48:59Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Digitalocean 2Gb droplet (LON)
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.7.0
VERSION_ID=1353.7.0
BUILD_ID=2017-04-26-2154
PRETTY_NAME="Container Linux by CoreOS 1353.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
uname -a):Linux coreos-2gb-lon1-01 4.9.24-coreos #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz GenuineIntel GNU/Linux
$ sudo docker info
Containers: 22
Running: 5
Paused: 0
Stopped: 17
Images: 7
Server Version: 1.12.6
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host overlay bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp selinux
Kernel Version: 4.9.24-coreos
Operating System: Container Linux by CoreOS 1353.7.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.958 GiB
Name: coreos-2gb-lon1-01
ID: B7FN:NKVG:SIOW:U43E:5E3R:H45Q:QR2O:M3CV:CZDX:RESY:OEHR:KAXE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Attempting kubeadm init the process stalls at waiting for the control plane to become ready. The etcd pause container is crashlooping with the following error:
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"write /proc/self/task/1/attr/exec: invalid argument\"".
SELinux is set to Permissive and the docker daemon has --selinux-enabled.
The following error appears in journalctl for each time the container crashes:
EDIT: This is coincidental as it happens without spc_t too (where the container launches successfully)
May 12 08:27:33 coreos-2gb-lon1-01 kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
For it not to be crashlooping
docker info shows selinux in the Security Options field.docker run --name=k8s_etcd_test --detach=true --security-opt "label=type:spc_t" --security-opt "seccomp=unconfined" gcr.io/google_containers/pause-amd64:3.0.Labels are set on the pod here: https://github.com/kubernetes/kubernetes/blob/2c05234674d0d59d24ac322e07c5f6c9267c1e5f/cmd/kubeadm/app/master/manifests.go#L128
I don't know enough about SELinux to suggest what might fix this but perhaps only adding the labels if SELinux is enforcing might work?
CoreOS uses a wrapper script (/usr/lib/coreos/dockerd) to set --selinux-enabled by default. Running the docker daemon without --selinux-enabled or starting the container without the type:spc_t label resolves the issue. Setting selinux enforcing to permissive does not.
Run docker without --selinux-enabled by bypassing the coreos wrapper script:
cp /usr/lib/systemd/system/docker.service /etc/systemd/system/docker.servicesed -i -e 's/lib\/coreos/bin/g' /etc/systemd/system/docker.servicesystemctl daemon-reload && systemctl restart dockerIs this a dupe of https://github.com/kubernetes/kubeadm/issues/215?
Currently we say that you should do setenforce 0 until that issue is fixed
@luxas Nope. This happens even when you use setenforce 0.
core@coreos-2gb-lon1-01 ~ $ getenforce
Permissive
core@coreos-2gb-lon1-01 ~ $ sudo setenforce 0
core@coreos-2gb-lon1-01 ~ $ getenforce
Permissive
core@coreos-2gb-lon1-01 ~ $ docker run --name=k8s_etcd_test --detach=true --security-opt "label=type:spc_t" --security-opt "seccomp=unconfined" gcr.io/google_containers/pause-amd64:3.0
c479f9a4cc1dd910b9982797a2e9a555c4f54e2da5ec1e488fe44f1a7a6fe971
docker: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/self/task/1/attr/exec: invalid argument\\\"\"\n".
Related? https://github.com/coreos/bugs/issues/1580 https://github.com/moby/moby/issues/20798~~
I'm not sure if the SELinux error that I observed is the cause of the pause container failure or just coincidental.
EDIT: This is a coincidental error message as it happens even if you omit the spc_t option (and the container starts fine).
You'd likely need to update selinux policy...
@philips is there someone that could investigate selinux policy on your end?
@mikesimons Thanks for the report! I'm not familiar with SELinux, can someone else chime in?
cc @dgoodwin @aaronlevy @coeki @rhatdan
Most likely the container platform does not understand what an spc_t type is, or something along those lines.
If this system has audit running see if you can get this output.
grep spc_t /var/log/audit/audit.log*
Otherwise look for it in /var/log/messages or dmesg
dmesg | grep spc_t
@rhatdan I can't get coreos to give me an audit log but there are some audit related things in journalctl. I can find no logs containing spc_t at all (No /var/log/audit gets created, journalctl has logs as below and dmesg looks much like journalctl without some of the service logs)
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered blocking state
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered disabled state
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: device vethcb6c73b entered promiscuous mode
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: audit: type=1700 audit(1496924613.743:115): dev=vethcb6c73b prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Jun 08 12:23:33 coreos-512mb-ams2-01 audit: ANOM_PROMISCUOUS dev=vethcb6c73b prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-udevd[1795]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-udevd[1795]: Could not generate persistent MAC address for veth63b8448: No such file or directory
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-udevd[1797]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-udevd[1797]: Could not generate persistent MAC address for vethcb6c73b: No such file or directory
Jun 08 12:23:33 coreos-512mb-ams2-01 audit[1139]: SYSCALL arch=c000003e syscall=44 success=yes exit=40 a0=a a1=c820c4e180 a2=28 a3=0 items=0 ppid=1 pid=1139 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=system_u:system_r:kernel_t:s0 key=(null)
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: audit: type=1300 audit(1496924613.743:115): arch=c000003e syscall=44 success=yes exit=40 a0=a a1=c820c4e180 a2=28 a3=0 items=0 ppid=1 pid=1139 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=system_u:system_r:kernel_t:s0 key=(null)
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:33 coreos-512mb-ams2-01 audit: PROCTITLE proctitle=646F636B657264002D2D686F73743D66643A2F2F002D2D636F6E7461696E6572643D2F7661722F72756E2F646F636B65722F6C6962636F6E7461696E6572642F646F636B65722D636F6E7461696E6572642E736F636B002D2D73656C696E75782D656E61626C6564
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: audit: type=1327 audit(1496924613.743:115): proctitle=646F636B657264002D2D686F73743D66643A2F2F002D2D636F6E7461696E6572643D2F7661722F72756E2F646F636B65722F6C6962636F6E7461696E6572642F646F636B65722D636F6E7461696E6572642E736F636B002D2D73656C696E75782D656E61626C6564
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:33 coreos-512mb-ams2-01 kernel: IPv6: ADDRCONF(NETDEV_UP): vethcb6c73b: link is not ready
Jun 08 12:23:33 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: eth0: renamed from veth63b8448
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethcb6c73b: link becomes ready
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered blocking state
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered forwarding state
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-networkd[933]: vethcb6c73b: Gained carrier
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-networkd[933]: docker0: Gained carrier
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 containerd[1135]: time="2017-06-08T12:23:34.187403052Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/self/task/1/attr/exec: invalid argument\\\"\"\n" id=462632de47599e329f9b7cad69d53222a54e75475fc8302d5542ee16aa1d0640
Jun 08 12:23:34 coreos-512mb-ams2-01 dockerd[1139]: time="2017-06-08T12:23:34.190281718Z" level=error msg="Create container failed with error: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"write /proc/self/task/1/attr/exec: invalid argument\\\\\\\"\\\"\\n\""
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered disabled state
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-networkd[933]: vethcb6c73b: Lost carrier
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: veth63b8448: renamed from eth0
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered disabled state
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-networkd[933]: vethcb6c73b: Removing non-existent address: fe80::c8a2:fbff:fe5c:a5c2/64 (valid forever)
Jun 08 12:23:34 coreos-512mb-ams2-01 audit: ANOM_PROMISCUOUS dev=vethcb6c73b prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-udevd[1853]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:34 coreos-512mb-ams2-01 audit[1139]: SYSCALL arch=c000003e syscall=44 success=yes exit=32 a0=a a1=c820bfc3e0 a2=20 a3=0 items=0 ppid=1 pid=1139 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=system_u:system_r:kernel_t:s0 key=(null)
Jun 08 12:23:34 coreos-512mb-ams2-01 audit: PROCTITLE proctitle=646F636B657264002D2D686F73743D66643A2F2F002D2D636F6E7461696E6572643D2F7661722F72756E2F646F636B65722F6C6962636F6E7461696E6572642F646F636B65722D636F6E7461696E6572642E736F636B002D2D73656C696E75782D656E61626C6564
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-udevd[1853]: link_config: could not get ethtool features for veth63b8448
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-udevd[1853]: Could not set offload features of veth63b8448: No such device
Jun 08 12:23:34 coreos-512mb-ams2-01 dockerd[1139]: time="2017-06-08T12:23:34.372221601Z" level=error msg="Handler for POST /v1.24/containers/462632de47599e329f9b7cad69d53222a54e75475fc8302d5542ee16aa1d0640/start returned error: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"write /proc/self/task/1/attr/exec: invalid argument\\\\\\\"\\\"\\n\""
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: device vethcb6c73b left promiscuous mode
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: audit: type=1700 audit(1496924614.232:116): dev=vethcb6c73b prom=0 old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: docker0: port 1(vethcb6c73b) entered disabled state
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: audit: type=1300 audit(1496924614.232:116): arch=c000003e syscall=44 success=yes exit=32 a0=a a1=c820bfc3e0 a2=20 a3=0 items=0 ppid=1 pid=1139 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=system_u:system_r:kernel_t:s0 key=(null)
Jun 08 12:23:34 coreos-512mb-ams2-01 kernel: audit: type=1327 audit(1496924614.232:116): proctitle=646F636B657264002D2D686F73743D66643A2F2F002D2D636F6E7461696E6572643D2F7661722F72756E2F646F636B65722F6C6962636F6E7461696E6572642F646F636B65722D636F6E7461696E6572642E736F636B002D2D73656C696E75782D656E61626C6564
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-networkd[933]: docker0: Lost carrier
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Network configuration changed, trying to establish connection.
Jun 08 12:23:34 coreos-512mb-ams2-01 systemd-timesyncd[888]: Synchronized to time server 80.100.130.235:123 (2.coreos.pool.ntp.org).
Jun 08 12:23:39 coreos-512mb-ams2-01 sudo[1861]: core : TTY=pts/0 ; PWD=/var ; USER=root ; COMMAND=/bin/journalctl
Jun 08 12:23:39 coreos-512mb-ams2-01 sudo[1861]: pam_unix(sudo:session): session opened for user root by core(uid=0)
Jun 08 12:23:39 coreos-512mb-ams2-01 sudo[1861]: pam_systemd(sudo:session): Cannot create session: Already running in a session
This looks like the command is trying to execute containers as "\n", and this is not a valid label. I have no idea how good CoreOS is at supporting SELinux. So I am not really able to help.
@mikesimons Audit logs are off by default. Our selinux page has instructions for turning them on.
It's true that we don't have an spc_t type and it seems likely we should include one.
However, it seems like there should be a much easier fix: just deleting that label.
All etcd needs to do is write to a hostPath volume mount, and my understanding was that as of https://github.com/kubernetes/kubernetes/pull/33663 (included in k8s 1.5.0+), all host bindmounts would be relabelled by default.
Assuming it works, that would be both more secure and should work on more selinux configurations.
Threw up a PR https://github.com/kubernetes/kubernetes/pull/49328
Most helpful comment
Threw up a PR https://github.com/kubernetes/kubernetes/pull/49328