so far I've not once seen this locally, but we have this occasionally in CI, maybe a few times a day with many, many runs.
/assign
/lifecycle active
https://github.com/kubernetes-sigs/kind/issues/928#issuecomment-541954010
per the stack trace, this is definitely failing in LocalCmd.Run(), so we're seeing a signal: broken pipe from docker exec ....
currently I've exclusively found these with kubeadm init.
however in this example: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-ipv6/1182717569350504448
you can also see ERROR: command "docker exec --privileged kind-control-plane tar --hard-dereference -C /var/log -chf - ." failed with error: write |1: broken pipe from kind export logs
everything else I can find involves kubeadm init
another example https://github.com/kubernetes/kubernetes/pull/83956#issuecomment-542277813
our docker is fairly old, https://github.com/kubernetes/test-infra/pull/14784
Still trying to get us on newer docker. Couple of pending test-infra PRs.
not much progress today -- infra wg this morning and then mitigating, debugging, and dealing with https://github.com/kubernetes/test-infra/pull/14812
now have https://github.com/kubernetes/test-infra/pull/14820 to see about using newer docker in slightly streamlined kind CI specific image experimentally.
here's an example with kubeadm join
W1016 15:02:01.370] ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged kind-worker kubeadm join --config /kind/kubeadm.conf --ignore-preflight-errors=all --v=6" failed with error: signal: broken pipe
have kind experimentally on docker 19.03.X on debian buster. will follow up in the morning.
we can see if we continue seeing these flakes under more recent docker...
~everything in kubernetes CI should be on 19.03.X now, we'll have to wait to see if we continue to get these
still an issue with new docker, https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-parallel-ipv6/1185195854433095681 here's one with join.
~everything in kubernetes CI should be on 19.03.X now
did test-infra move to 19.03 already?
I moved the test-infra images, yes.
ok, thanks for letting me know.
we held on moving the kubeadm validators and k/k/build/dependencies.yaml because according to TimSC not all the popular distros have 19.x in their package mangers yet.
but i guess we should add it as "verified" soon.
@neolit123 we're not running kubeadm / docker against that. the _dind image_ is on 19.03 (kubekins-e2e, and kind's KRTE), but the hosts are on whatever the hosts are on, and the kind nodes are on whatever the kind nodes run
too many layers :-)
too many layers :-)
indeed. :)
just wondering when...
kinder doesn't have kind node images with docker 19.03 yet, maybe that's the switching point for k/k/build/dependencies.yaml and updating the kubeadm validators.
@neolit123 I wouldn't read much into us using 19.03 to host the nodes, someday it might be podman or ignite, it won't reflect much on qualifying with kubeadm. Mostly a shot in the dark regarding the stability issues.
I think there's a _small_ chance the root cause of https://github.com/kubernetes-sigs/kind/issues/971 is related here, the go program would get a broken pipe signal if the internal pipe is closed after the internal io.Copy hit an error. So far I've not identified a path where we'd be triggering this though (versus the panic)..
We haven't had one since the patch for #971 went in. However I want to wait a bit longer before calling this fixed.
we are however seeing this with kind export logs:
W1021 15:48:19.198] ERROR: command "docker exec --privileged kind-worker sh -c 'tar --hard-dereference -C /var/log -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: write |1: broken pipe
hopefully unrelated, need to investigate.
EDIT: traced the code, we prefer returning the error from the process, versus from the reader, so it's likely we're seeing this because the reader errored, which would not surprise me for the current untar routine... filed https://github.com/kubernetes-sigs/kind/pull/992 to debug
we've not had any of these creation failures since https://github.com/kubernetes-sigs/kind/issues/971, granted it has not been an extremely large amount of time.
tentatively closing, but still monitoring.
will file a new issue for untar issues they don't appear to be related.
still haven't identified another one since that fix.
still no signs of this, I think we're in the clear on this one.