Kind: Failed to get IPs for node: kind-control-plane: file should only be one line, got 2 lines

Created on 21 Feb 2020 · 18Comments · Source: kubernetes-sigs/kind

What happened: I'm seeing the same issue as described in #1149. Here is what I'm seeing:


+ kind create cluster --wait 30m --image kindest/node:v1.14.9@sha256:bdd3731588fa3ce8f66c7c22f25351362428964b6bca13048659f68b9e665b72
--
  | Creating cluster "kind" ...
  | ✓ Ensuring node image (kindest/node:v1.14.9) 🖼
  | ✓ Preparing nodes 📦
  | ✗ Writing configuration 📜
  | ERROR: failed to create cluster: failed to get IPs for node: kind-control-plane: file should only be one line, got 2 lines

What you expected to happen: The cluster starts up normally.

How to reproduce it (as minimally and precisely as possible): I haven't been able to get it to repro reliably, but once it happens, it's stuck that way.

Anything else we need to know?: The solution proposed in #1149 of blowing away the .docker directory doesn't work for us. It's occurring during CI, so needing to babysit the agents and kill them once this happens is not an option. It seems to be a problem with the construction of the base image

Environment:

kind version: (use kind version): kind v0.6.0 go1.13.4 linux/amd64
Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Docker version: (use docker info):

Client:
--
  | Debug Mode: false
  |  
  | Server:
  | Containers: 0
  | Running: 0
  | Paused: 0
  | Stopped: 0
  | Images: 14
  | Server Version: 18.09.6
  | Storage Driver: overlay2
  | Backing Filesystem: extfs
  | Supports d_type: true
  | Native Overlay Diff: true
  | Logging Driver: json-file
  | Cgroup Driver: cgroupfs
  | Plugins:
  | Volume: local
  | Network: bridge host macvlan null overlay
  | Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
  | Swarm: inactive
  | Runtimes: runc
  | Default Runtime: runc
  | Init Binary: docker-init
  | containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
  | runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
  | init version: fec3683
  | Security Options:
  | apparmor
  | seccomp
  | Profile: default
  | Kernel Version: 4.15.0-1052-gcp
  | Operating System: Ubuntu 16.04.6 LTS
  | OSType: linux
  | Architecture: x86_64
  | CPUs: 16
  | Total Memory: 58.97GiB
  | Name: bk-6dd1738aeec68778a4e320f52a0193781e61e8d2-3vkq
  | ID: NWBL:2PDV:FRB2:TXXZ:4LS5:FER2:FOLR:QG5T:4QF2:FEQJ:7ERT:6MTY
  | Docker Root Dir: /var/lib/docker
  | Debug Mode: false
  | Registry: https://index.docker.io/v1/
  | Labels:
  | Experimental: false
  | Insecure Registries:
  | 127.0.0.0/8
  | Live Restore Enabled: false
  | Product License: Community Engine

OS (e.g. from /etc/os-release):

NAME="Ubuntu"
--
  | VERSION="16.04.6 LTS (Xenial Xerus)"
  | ID=ubuntu
  | ID_LIKE=debian
  | PRETTY_NAME="Ubuntu 16.04.6 LTS"
  | VERSION_ID="16.04"
  | HOME_URL="http://www.ubuntu.com/"
  | SUPPORT_URL="http://help.ubuntu.com/"
  | BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
  | VERSION_CODENAME=xenial
  | UBUNTU_CODENAME=xenial

kinbug triagneeds-information triagnot-reproducible

Source

strican

Most helpful comment

We've since found that there are circumstances where docker spits out an error on ~all commands (e.g. due to bad ownership of docker config), you should check that, but we've already filed https://github.com/kubernetes-sigs/kind/pull/1415 which is merged into master to not read from stderr.

BenTheElder on 31 Mar 2020

👍3

All 18 comments

Hi, in https://github.com/kubernetes-sigs/kind/issues/1149 the issue was that docker was an alias sudo docker. I don't think anyone actually proposed blowing away .docker though the author did do this.

Can you share your docker ~info /~ docker config?

/assign

BenTheElder on 21 Feb 2020

Er, also noting that v0.6.0 is not the latest, any bugfixes we may have already made would be in v0.7.0.

What does docker network inspect bridge give you?

BenTheElder on 21 Feb 2020

Thanks for the reply. I noticed that we weren't on latest so I'm currently trying on v0.7.0. Time will tell if that solves this since there's no consistent repro.

My agent doesn't have a docker config, at least when I run docker config ls I get

+ docker config ls
--
  | Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.

Not sure if that helps.

strican on 21 Feb 2020

Er sorry, the docker daemon config is in a json file on the host, the
docker config command is actually confusingly unrelated

On Fri, Feb 21, 2020, 11:50 Steven Tricanowicz notifications@github.com
wrote:

Thanks for the reply. I noticed that we weren't on latest so I'm currently
trying on v0.7.0. Time will tell if that solves this since there's no
consistent repro.

My agent doesn't have a docker config, at least when I run docker config
ls I get

docker config ls

--

| Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.

Not sure if that helps.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1350?email_source=notifications&email_token=AAHADK7OKFEEWFJ3UMU5U3LREAWA3A5CNFSM4KZFTZC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMT4III#issuecomment-589808673,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK6RBZOSEIO4TLUK2ODREAWA3ANCNFSM4KZFTZCQ
.

BenTheElder on 21 Feb 2020

It appears we don't have one, so we're just using the default. Or at least there's none at /etc/docker/daemon.json.

strican on 21 Feb 2020

That's surprising,

So this error means that for some reason when we ask docker to inspect the
container we created with a format string that lists the IPs we got more
output that we expected...

That shouldn't happen 🤔

On Fri, Feb 21, 2020, 12:20 Steven Tricanowicz notifications@github.com
wrote:

It appears we don't have one, so we're just using the default. Or at least
there's none at /etc/docker/daemon.json.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1350?email_source=notifications&email_token=AAHADK6BBV4G5PM7RDUSFJLREAZRDA5CNFSM4KZFTZC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMT65JY#issuecomment-589819559,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK5SCOQWPXCR26EQMFTREAZRDANCNFSM4KZFTZCQ
.

BenTheElder on 21 Feb 2020

That's surprising, So this error means that for some reason when we ask docker to inspect the container we created with a format string that lists the IPs we got more output that we expected... That shouldn't happen 🤔

@BenTheElder should we be able to see the docker inspect error with more verbosity?
https://github.com/kubernetes-sigs/kind/blob/5cf3257f5bb5fe11828b4f310f8882f349753234/pkg/cluster/internal/providers/docker/node.go#L54-L64

@strican can you add a -v7 flag per example to kind in your CI so the next time we have more data, I'm curious about those extra-lines of the command
docker inspect -f {{range .NetworkSettings.Networks}}{{.IPAddress}},{{.GlobalIPv6Address}}{{end}}

aojea on 22 Feb 2020

@aojea it's NOT a docker inspect error in that the command did succeed and exit 0. The output is just unexpected. There shouldn't be more lines.

BenTheElder on 23 Feb 2020

👍1

More specifically, we pass a format to docker that includes no newlines, so multiple lines should not be possible under normal circumstances.

BenTheElder on 23 Feb 2020

👍1

@aojea I've reverted back to v0.6.0 and added the v7 flag and have been running builds all day. I haven't hit the issue yet and I don't think we've hit it since updating to v0.7.0. Unfortunately this might be a case of "No Repro", but I'll keep trying and let you know if I hit anything. Thanks all for jumping on this.

strican on 24 Feb 2020

👍2 ❤1

going to close for now as not reproducible but please /reopen with more information if you spot this again!

BenTheElder on 6 Mar 2020

👍1

FWIW I have this error when using a kind v0.7.0 in docker:19.03.8-dind image in a GitHub Actions workflow which is using docker version:

Client:
 Version:           3.0.10+azure
 API version:       1.40
 Go version:        go1.12.14
 Git commit:        99c5edceb48d64c1aa5d09b8c9c499d431d98bb9
 Built:             Tue Nov  5 00:55:15 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.10+azure
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.14
  Git commit:       ea84732a77
  Built:            Fri Jan 24 20:08:11 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.11
  GitCommit:        f772c10a585ced6be8f86e8c58c2b998412dd963
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

... installing and running kind in the workflow steps directly and the cluster is created without any issues. No issues when running the kind in dind environment on my PC (Ubuntu 18.04 with docker 19.03.7) so I'm suspecting the issue is with the hosts docker version.

masseybradley on 18 Mar 2020

👀1

looking over this now, I think CombinedOutputLines was just handy and perhaps what's actually happening here is docker is printing some error to stderr ...? I'm going to patch this to use just stdout. not sure if that's actually the issue (you'd expect the command to fail anyhow?) but sending the PR anyhow

BenTheElder on 18 Mar 2020

@BenTheElder Facing the same error when using a kind v0.7.0 with docker version:

```Client: Docker Engine - Community
Version: 19.03.4
API version: 1.40
Go version: go1.12.10
Git commit: 9013bf5
Built: Thu Oct 17 23:44:48 2019
OS/Arch: darwin/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf5
Built: Thu Oct 17 23:50:38 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
```

.. saw the same error as with kind v0.6.0 installed. Not sure if it's worth reopening but would be good to know if im doing something wrong.

ishanmkh on 31 Mar 2020

BenTheElder on 31 Mar 2020

👍3

Same here. I am also seeing this error
$ minikube start --driver=docker
😄 minikube v1.12.2 on Ubuntu 20.04
✨ Using the docker driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🔄 Restarting existing docker container for "minikube" ...
🤦 StartHost failed, but will try again: IPs output should only be one line, got 2 lines
🏃 Updating the running docker "minikube" container ...

😿 Failed to start docker container. "minikube start" may fix it: provision: Temporary Error: error getting ip during provisioning: IPs output should only be one line, got 2 lines

💣 error provisioning host: Failed to start host: provision: Temporary Error: error getting ip during provisioning: IPs output should only be one line, got 2 lines

😿 minikube is exiting due to an error. If the above message is not useful, open an issue:
👉 https://github.com/kubernetes/minikube/issues/new/choose