Test-infra: kubetest/kind: odd bug caused by enabling `--loglevel=debug` for kind

Created on 22 Feb 2019  路  16Comments  路  Source: kubernetes/test-infra

i'm seeing something very odd in the kind deployer for kubetest.

this PR:
https://github.com/kubernetes/test-infra/pull/11379/files

is causing:

2019/02/20 16:00:32 kind.go:336: kind.go:IsUp()
2019/02/20 16:00:32 kind.go:457: kind.go:KubectlCommand()
2019/02/20 16:00:32 process.go:153: Running: kubectl get nodes --no-headers
The connection to the server localhost:8080 was refused - did you specify the right host or port?
2019/02/20 16:00:32 process.go:155: Step 'kubectl get nodes --no-headers' finished in 77.854404ms

2019/02/22 05:07:22 main.go:307: Something went wrong: encountered 2 errors: [error starting kubectl --match-server-version=false version: exec: already started error during kubectl --match-server-version=false get nodes -oyaml: exit status 1]

https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kind-master/311

can be reproduced locally using:

./kubetest --provider=skeleton --deployment=kind --kind-binary-version=build --kind-config-path=./../../k8s.io/kubeadm/tests/e2e/kind/single-cp/single-cp.yaml --build=bazel --up --check-version-skew=false --ginkgo-parallel --timeout=30m

investigating..

/area kubetest
/kind bug
/assign
cc @BenTheElder @krzyzacy

arekubetest kinbug

All 16 comments

was the kubeconfig path set? this control flow is so convoluted, in kubetest2 i think we will just let the deployer decide if it wants to cleanup previous deployments in Up() itself.

was the kubeconfig path set?

yes.

i've added this call to double check:

log.Printf("kind.go:getKubeConfigPath(): %s\n", string(o))

but i'm seeing this, which looks like a mix between kind and kubetest output.

2019/02/22 05:50:59 kind.go:172: kind.go:getKubeConfigPath(): time="05:50:58" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label \"io.k8s.sigs.kind.cluster\"}}]"
/home/lubo-it/.kube/kind-config-kind-kubetest

we call clusterExists in both down and TestSetup, should be from Down()?

I wonder if we should have the KubectlCommand() method ensure the kubectl command has the kubeconfig set. (either by flag or by env on the command)

I wonder if we should have the KubectlCommand() method ensure the kubectl command has the kubeconfig set. (either by flag or by env on the command)

this seems to work fine.

i have a suspicion that we are somehow convoluting the d.control.Output calls with the logs from kind with --v=6 and --loglevel=debug. o_O

setting --loglevel=info fixes the problem.

setting --loglevel=info fixes the problem.

whoah, uhhh. what?

how is kind#55 related?

whoah, uhhh. what?

this is the weird part. the problem was introduced with PR https://github.com/kubernetes/test-infra/pull/11379/files

i reproduce this locally.

how is kind#55 related?

the user mentioned that toggling log level fixes a cluster creation bug.

sounds like a race / memory corruption / something else.
i will now build custom kind binary and try to feed it in a custom kubetest to check something.

oh, I'm pretty certain 55 had a race just in if the api-server survives startup or not, disk eviction on a noisy host / workstation caused all kinds of fun.

the fact that a kind debug level toggle, modifies the app behavior i think means a race in kind?

it may have just been a coincidence, there shouldn't be any races in kind currently, the bringup is all serial.

diff --git a/pkg/exec/local.go b/pkg/exec/local.go
index 088438e..a8c2f4f 100644
--- a/pkg/exec/local.go
+++ b/pkg/exec/local.go
@@ -19,8 +19,7 @@ package exec
 import (
        "io"
        osexec "os/exec"
-
-       log "github.com/sirupsen/logrus"
+       // log "github.com/sirupsen/logrus"
 )

 // LocalCmd wraps os/exec.Cmd, implementing the kind/pkg/exec.Cmd interface
@@ -64,6 +63,6 @@ func (cmd *LocalCmd) SetStderr(w io.Writer) {

 // Run runs
 func (cmd *LocalCmd) Run() error {
-       log.Debugf("Running: %v %v", cmd.Path, cmd.Args)
+       // log.Debugf("Running: %v %v", cmd.Path, cmd.Args)
        return cmd.Cmd.Run()
 }

this patch fixes the bug that i'm seeing with --loglevel=debug.

this issue is in the path after all:

~/.kubetest/kind/kind get kubeconfig-path --loglevel=debug --name=kind-kubetest
DEBU[07:23:35] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}}] 
/home/lubo-it/.kube/kind-config-kind-kubetest

debug adds extra output and pollutes the config path.

oh LOL that makes sense. imo that's a bug in the kubetest/kind package but we should really think more about logging in kind

https://github.com/kubernetes/test-infra/pull/11444
^ fix is here.

while digging around i found that tryUntil calls might need a small amount of sleep in the for loop:
https://github.com/kubernetes-sigs/kind/search?q=tryUntil&unscoped_q=tryUntil

especially the one that runs kubectl on the control-plane.
will send PR soon for that too.

yeah, tryUntil also needs us to fix commands possibly hanging indefinitely. (another thing I want to change at the exec package level). thanks for hunting this down and patching it 馃檹

Was this page helpful?
0 / 5 - 0 ratings

Related issues

spiffxp picture spiffxp  路  3Comments

xiangpengzhao picture xiangpengzhao  路  3Comments

chaosaffe picture chaosaffe  路  3Comments

sjenning picture sjenning  路  4Comments

stevekuznetsov picture stevekuznetsov  路  4Comments