Test-infra: kubetest/kind: bug in the kubeadm-kind-1.12 job

Created on 25 Feb 2019  路  6Comments  路  Source: kubernetes/test-infra

i'm seeing a problem in the 1.12 kubeadm/kind job:

https://testgrid.k8s.io/sig-cluster-lifecycle-all#kubeadm-kind-1.12
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kind-1-12/13/build-log.txt

2019/02/25 20:59:56 process.go:153: Running: ./hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]|Alpha|Kubectl|\[(Disruptive|Feature:[^\]]+|Flaky)\] --num-nodes=3 --report-dir=/logs/artifacts --disable-log-dump=true
Conformance test: not doing test setup.
Found no test suites
For usage instructions:
    ginkgo help
!!! Error in ./hack/ginkgo-e2e.sh:143
  Error in ./hack/ginkgo-e2e.sh:143. '"${ginkgo}" "${ginkgo_args[@]:+${ginkgo_args[@]}}" "${e2e_test}" -- "${auth_config[@]:+${auth_config[@]}}" --ginkgo.flakeAttempts="${FLAKE_ATTEMPTS}" --host="${KUBE_MASTER_URL}" --provider="${KUBERNETES_PROVIDER}" --gce-project="${PROJECT:-}" --gce-zone="${ZONE:-}" --gce-region="${REGION:-}" --gce-multizone="${MULTIZONE:-false}" --gke-cluster="${CLUSTER_NAME:-}" --kube-master="${KUBE_MASTER:-}" --cluster-tag="${CLUSTER_ID:-}" --cloud-config-file="${CLOUD_CONFIG:-}" --repo-root="${KUBE_ROOT}" --node-instance-group="${NODE_INSTANCE_GROUP:-}" --prefix="${KUBE_GCE_INSTANCE_PREFIX:-e2e}" --network="${KUBE_GCE_NETWORK:-${KUBE_GKE_NETWORK:-e2e}}" --node-tag="${NODE_TAG:-}" --master-tag="${MASTER_TAG:-}" --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} ${NUM_NODES:+"--num-nodes=${NUM_NODES}"} ${E2E_REPORT_DIR:+"--report-dir=${E2E_REPORT_DIR}"} ${E2E_REPORT_PREFIX:+"--report-prefix=${E2E_REPORT_PREFIX}"} "${@:-}"' exited with status 1
Call stack:
  1: ./hack/ginkgo-e2e.sh:143 main(...)

i need to dig into why this is happening, but posting in advance in case Found no test suites is a known problem.
cc @krzyzacy @BenTheElder

/kind bug
/area kubetest

arekubetest kinbug

All 6 comments

looks like i can repro this locally:

./kubetest --deployment=kind --test --test_args="--ginkgo.focus=\[Conformance\] --num-nodes=3 --report-dir=/logs/artifacts --disable-log-dump=true" --kind-binary-version=build
Found no test suites

will now try to figure out what is different here and between existing 1.12 test jobs that work. O_o

did something change in ginkgo-e2e.sh?

really hoping we can stop using this in the kubetest2 tester(s) ...

between 1.12 and master:

diff --git a/hack/ginkgo-e2e.sh b/hack/ginkgo-e2e.sh
old mode 100755
new mode 100644
index 0cac8afc6b..c4fc31186d
--- a/hack/ginkgo-e2e.sh
+++ b/hack/ginkgo-e2e.sh
@@ -87,7 +87,7 @@ if [[ "${KUBERNETES_PROVIDER}" == "gce" ]]; then
   set_num_migs
   NODE_INSTANCE_GROUP=""
   for ((i=1; i<=${NUM_MIGS}; i++)); do
-    if [[ $i == ${NUM_MIGS} ]]; then
+    if [[ ${i} == ${NUM_MIGS} ]]; then
       # We are assigning the same mig names as create-nodes function from cluster/gce/util.sh.
       NODE_INSTANCE_GROUP="${NODE_INSTANCE_GROUP}${NODE_INSTANCE_PREFIX}-group"
     else
@@ -161,6 +161,8 @@ export PATH=$(dirname "${e2e_test}"):"${PATH}"
   --master-tag="${MASTER_TAG:-}" \
   --cluster-monitoring-mode="${KUBE_ENABLE_CLUSTER_MONITORING:-standalone}" \
   --prometheus-monitoring="${KUBE_ENABLE_PROMETHEUS_MONITORING:-false}" \
+  --dns-domain="${KUBE_DNS_DOMAIN:-cluster.local}" \
+  --ginkgo.slowSpecThreshold="${GINKGO_SLOW_SPEC_THRESHOLD:-300}" \
   ${KUBE_CONTAINER_RUNTIME:+"--container-runtime=${KUBE_CONTAINER_RUNTIME}"} \
   ${MASTER_OS_DISTRIBUTION:+"--master-os-distro=${MASTER_OS_DISTRIBUTION}"} \
   ${NODE_OS_DISTRIBUTION:+"--node-os-distro=${NODE_OS_DISTRIBUTION}"} \

test/e2e/e2e.go diff:

--- e2e.go  2019-02-26 05:27:18.751107012 +0200
+++ e2e.go_master   2019-02-26 05:27:11.823108597 +0200
@@ -24,18 +24,16 @@
    "testing"
    "time"

-   "github.com/golang/glog"
    "github.com/onsi/ginkgo"
    "github.com/onsi/ginkgo/config"
    "github.com/onsi/ginkgo/reporters"
    "github.com/onsi/gomega"
+   "k8s.io/klog"

    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    runtimeutils "k8s.io/apimachinery/pkg/util/runtime"
-   "k8s.io/apiserver/pkg/util/logs"
    clientset "k8s.io/client-go/kubernetes"
-   "k8s.io/kubernetes/pkg/cloudprovider/providers/azure"
-   gcecloud "k8s.io/kubernetes/pkg/cloudprovider/providers/gce"
+   "k8s.io/component-base/logs"
    "k8s.io/kubernetes/pkg/version"
    commontest "k8s.io/kubernetes/test/e2e/common"
    "k8s.io/kubernetes/test/e2e/framework"
@@ -46,86 +44,20 @@

    // ensure auth plugins are loaded
    _ "k8s.io/client-go/plugin/pkg/client/auth"
+
+   // ensure that cloud providers are loaded
+   _ "k8s.io/kubernetes/test/e2e/framework/providers/aws"
+   _ "k8s.io/kubernetes/test/e2e/framework/providers/azure"
+   _ "k8s.io/kubernetes/test/e2e/framework/providers/gce"
+   _ "k8s.io/kubernetes/test/e2e/framework/providers/kubemark"
+   _ "k8s.io/kubernetes/test/e2e/framework/providers/openstack"
 )

 var (
-   cloudConfig = &framework.TestContext.CloudConfig
+   cloudConfig      = &framework.TestContext.CloudConfig
+   nodeKillerStopCh = make(chan struct{})
 )

-// setupProviderConfig validates and sets up cloudConfig based on framework.TestContext.Provider.
-func setupProviderConfig() error {
-   switch framework.TestContext.Provider {
-   case "":
-       glog.Info("The --provider flag is not set.  Treating as a conformance test.  Some tests may not be run.")
-
-   case "gce", "gke":
-       framework.Logf("Fetching cloud provider for %q\r", framework.TestContext.Provider)
-       zone := framework.TestContext.CloudConfig.Zone
-       region := framework.TestContext.CloudConfig.Region
-
-       var err error
-       if region == "" {
-           region, err = gcecloud.GetGCERegion(zone)
-           if err != nil {
-               return fmt.Errorf("error parsing GCE/GKE region from zone %q: %v", zone, err)
-           }
-       }
-       managedZones := []string{} // Manage all zones in the region
-       if !framework.TestContext.CloudConfig.MultiZone {
-           managedZones = []string{zone}
-       }
-
-       gceCloud, err := gcecloud.CreateGCECloud(&gcecloud.CloudConfig{
-           ApiEndpoint:        framework.TestContext.CloudConfig.ApiEndpoint,
-           ProjectID:          framework.TestContext.CloudConfig.ProjectID,
-           Region:             region,
-           Zone:               zone,
-           ManagedZones:       managedZones,
-           NetworkName:        "", // TODO: Change this to use framework.TestContext.CloudConfig.Network?
-           SubnetworkName:     "",
-           NodeTags:           nil,
-           NodeInstancePrefix: "",
-           TokenSource:        nil,
-           UseMetadataServer:  false,
-           AlphaFeatureGate:   gcecloud.NewAlphaFeatureGate([]string{}),
-       })
-
-       if err != nil {
-           return fmt.Errorf("Error building GCE/GKE provider: %v", err)
-       }
-
-       cloudConfig.Provider = gceCloud
-
-       // Arbitrarily pick one of the zones we have nodes in
-       if cloudConfig.Zone == "" && framework.TestContext.CloudConfig.MultiZone {
-           zones, err := gceCloud.GetAllZonesFromCloudProvider()
-           if err != nil {
-               return err
-           }
-
-           cloudConfig.Zone, _ = zones.PopAny()
-       }
-
-   case "aws":
-       if cloudConfig.Zone == "" {
-           return fmt.Errorf("gce-zone must be specified for AWS")
-       }
-   case "azure":
-       if cloudConfig.ConfigFile == "" {
-           return fmt.Errorf("config-file must be specified for Azure")
-       }
-       config, err := os.Open(cloudConfig.ConfigFile)
-       if err != nil {
-           framework.Logf("Couldn't open cloud provider configuration %s: %#v",
-               cloudConfig.ConfigFile, err)
-       }
-       defer config.Close()
-       cloudConfig.Provider, err = azure.NewCloud(config)
-   }
-
-   return nil
-}
-
 // There are certain operations we only want to run once per overall test invocation
 // (such as deleting old namespaces, or verifying that all system pods are running.
 // Because of the way Ginkgo runs tests in parallel, we must use SynchronizedBeforeSuite
@@ -137,10 +69,6 @@
 var _ = ginkgo.SynchronizedBeforeSuite(func() []byte {
    // Run only on Ginkgo node 1

-   if err := setupProviderConfig(); err != nil {
-       framework.Failf("Failed to setup provider config: %v", err)
-   }
-
    switch framework.TestContext.Provider {
    case "gce", "gke":
        framework.LogClusterImageSources()
@@ -148,7 +76,7 @@

    c, err := framework.LoadClientset()
    if err != nil {
-       glog.Fatal("Error loading client: ", err)
+       klog.Fatal("Error loading client: ", err)
    }

    // Delete any namespaces except those created by the system. This ensures no
@@ -163,7 +91,7 @@
        if err != nil {
            framework.Failf("Error deleting orphaned namespaces: %v", err)
        }
-       glog.Infof("Waiting for deletion of the following namespaces: %v", deleted)
+       klog.Infof("Waiting for deletion of the following namespaces: %v", deleted)
        if err := framework.WaitForNamespacesDeleted(c, deleted, framework.NamespaceCleanupTimeout); err != nil {
            framework.Failf("Failed to delete orphaned namespaces %v: %v", deleted, err)
        }
@@ -210,24 +138,23 @@
    // Reference common test to make the import valid.
    commontest.CurrentSuite = commontest.E2E

+   if framework.TestContext.NodeKiller.Enabled {
+       nodeKiller := framework.NewNodeKiller(framework.TestContext.NodeKiller, c, framework.TestContext.Provider)
+       nodeKillerStopCh = make(chan struct{})
+       go nodeKiller.Run(nodeKillerStopCh)
+   }
    return nil

 }, func(data []byte) {
    // Run on all Ginkgo nodes
-
-   if cloudConfig.Provider == nil {
-       if err := setupProviderConfig(); err != nil {
-           framework.Failf("Failed to setup provider config: %v", err)
-       }
-   }
 })

-// Similar to SynchornizedBeforeSuite, we want to run some operations only once (such as collecting cluster logs).
+// Similar to SynchronizedBeforeSuite, we want to run some operations only once (such as collecting cluster logs).
 // Here, the order of functions is reversed; first, the function which runs everywhere,
 // and then the function that only runs on the first Ginkgo node.
 var _ = ginkgo.SynchronizedAfterSuite(func() {
    // Run on all Ginkgo nodes
-   framework.Logf("Running AfterSuite actions on all node")
+   framework.Logf("Running AfterSuite actions on all nodes")
    framework.RunCleanupActions()
 }, func() {
    // Run only Ginkgo on node 1
@@ -240,6 +167,9 @@
            framework.Logf("Error gathering metrics: %v", err)
        }
    }
+   if framework.TestContext.NodeKiller.Enabled {
+       close(nodeKillerStopCh)
+   }
 })

 func gatherTestSuiteMetrics() error {
@@ -296,12 +226,12 @@
        // TODO: we should probably only be trying to create this directory once
        // rather than once-per-Ginkgo-node.
        if err := os.MkdirAll(framework.TestContext.ReportDir, 0755); err != nil {
-           glog.Errorf("Failed creating report directory: %v", err)
+           klog.Errorf("Failed creating report directory: %v", err)
        } else {
            r = append(r, reporters.NewJUnitReporter(path.Join(framework.TestContext.ReportDir, fmt.Sprintf("junit_%v%02d.xml", framework.TestContext.ReportPrefix, config.GinkgoConfig.ParallelNode))))
        }
    }
-   glog.Infof("Starting e2e run %q on Ginkgo node %d", framework.RunId, config.GinkgoConfig.ParallelNode)
+   klog.Infof("Starting e2e run %q on Ginkgo node %d", framework.RunId, config.GinkgoConfig.ParallelNode)

    ginkgo.RunSpecsWithDefaultAndCustomReporters(t, "Kubernetes e2e suite", r)
 }

i never understood why ginkgo is used even, but someone mentioned at some point that it was a mistake.

i found the problem....
https://github.com/kubernetes/kubernetes/commit/275212bbc964c453fbde596812eea1f992468ee2

:fire: bash :fire:

while in the release-1.12 branch of k/k, the kind deployer from kubetest builds the e2e.test binary, but then for some reason this line in ginkgo-e2e.sh returns an empty string:

e2e_test=$(kube::util::find-binary "e2e.test")

need to backport this for 1.12.

I came here to comment this, I remember this change. There was a fix to make it locate the bazel binary.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

spiffxp picture spiffxp  路  3Comments

MrHohn picture MrHohn  路  4Comments

fen4o picture fen4o  路  4Comments

BenTheElder picture BenTheElder  路  4Comments

cblecker picture cblecker  路  4Comments