Kind: ARM64 CI

Created on 19 Dec 2018  Â·  58Comments  Â·  Source: kubernetes-sigs/kind

per discussion in #kind slack, we should setup some CI with openlab to get kind on arm64 xref #166

@dims was able to get arm64 working, but we'll need some set this up to keep it working once that goes in, as the maintainers do not have access to suitable arm machines to test on otherwise.

/assign
/kind feature
/priority important-longterm

kinfeature lifecyclrotten prioritimportant-longterm

Most helpful comment

Thanks to @ZhengZhenyu and other awesome folks at OpenLab (https://github.com/theopenlab) We now have a functional KinD on ARM CI !!!

Please see:
http://status.openlabtesting.org/builds?job_name=kind-integration-test-arm64

All 58 comments

@lubinsz You might help on this?

note that we will need to fix #166 first, however that is very doable. Dims previously made a quick patch that worked, but we haven't PRed anything yet.

@BenTheElder @dixudx
I see.
At least, it contains a multi-arch image issue.
Let me apply an internal legal request for this project firstly ...

@lubinsz see my previous patch in https://github.com/kubernetes-sigs/kind/issues/166#issuecomment-448766016

https://github.com/WorksOnArm/cluster/issues/154 gives us access to Packet hardware.
How would we like it configured?

  • Running k8s?
    This would allow us to specify this cluster to run kind jobs and fit into sig-testings usual approach to testing.

  • running docker only
    We would need to ssh in, run kind + test, + cleanup.

/cc @devaii

I think running docker / SSH only is the most well understood path currently, we can treat these similar to a node or cadvisor e2e job, and put credentials in Prow to access them.

Long term it might be interesting to be able to run prowjobs on these machines directly, but that will require more work to maintain the cluster and it will take figuring out how we want to handle distributing other credentials.

When trying to kind build, we note that docker-ce and friends are not available directly from the same repos:

E: Version '18.06.*' for 'docker-ce' was not found
The command '/bin/sh -c curl -fsSL "https://download.docker.com/linux/$(. /etc/os-release; echo "$ID")/gpg" | apt-key add -     && apt-key fingerprint 0EBFCD88     && ARCH="${ARCH}" add-apt-repository         "deb [arch=${ARCH}] https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") $(lsb_release -cs) stable"     && clean-install "docker-ce=${DOCKER_VERSION}"' returned a non-zero code: 100
ERRO[22:13:41] Docker build Failed! exit status 100         
Error: build failed: exit status 100

Probably need some debugging:

root@kind:~# kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.13.3) 🖼 
ERRO[22:14:59] machine-id-setup error: exit status 1        
 ✗ [control-plane] Creating node container 📦 
Error: failed to create cluster: machine-id-setup error: exit status 1

Version check:

root@kind:~# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.1
 Git commit:        e68fc7a
 Built:             Fri Jan 25 14:35:17 2019
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       e68fc7a
  Built:            Thu Jan 24 10:49:48 2019
  OS/Arch:          linux/arm64
  Experimental:     false

It looks like the issue is due to the ARCH variable in the base-image being hard-coded to AMD64.

See: https://github.com/kubernetes-sigs/kind/blob/master/images/base/Dockerfile#L29
It looks like the ARCH variable is used later for the CNI plugin tarball, as well.

I am working on a patch.

Yeah, there's a bunch of places marked TODO for handling this because I wasn't sure where / how to plumb it through, I think using runtime.GOARCH should be fine, Dims's previous patch is here: https://github.com/kubernetes-sigs/kind/issues/188#issuecomment-453514226

@BenTheElder unfortunately the paste with patch expired

we still need CI, https://github.com/kubernetes-sigs/kind/pull/358 works well!

Thanks to @ZhengZhenyu and other awesome folks at OpenLab (https://github.com/theopenlab) We now have a functional KinD on ARM CI !!!

Please see:
http://status.openlabtesting.org/builds?job_name=kind-integration-test-arm64

Now that we have the jobs running successfully for a few days we would like to know how the kind community would like to do the testgrid reporting. I believe there are two options:

  • push to openlab owned gcs bucket
  • push to kind owned gcs bucket

I believe the second option is possible but would require setting up a user/auth acct for openlab and of course the other would be for openlab to resolve; using the existing bucket we use for cloud-provider-openstack or setup a new one

I could be wrong but we are ready to get the reporting to the proper place so the community can work as expected on any issues surfaced.

cc @BenTheElder - please see the question from Melvin ^^

either works! see also https://github.com/kubernetes/test-infra/tree/master/testgrid/conformance, we can setup a GCS bucket if we don't want to use any existing ones.

[sorry for the huge delay, this slipped through my inbox :(]

No problem @BenTheElder totally understand. Will create a new one just to keep things separated since that is possible

Re-Reading through this..
I created gs://k8s-conformance-kind-arm64-openlab if we need it, @mrhillsman @dims shoot me an email if we need that and I'll coordinate the service account credentials there :sweat_smile:

ack @BenTheElder
/cc @dims

Hi @mrhillsman , @dims and @BenTheElder I add an issue in OpenLab side to trace this job https://github.com/theopenlab/openlab/issues/257

Hi, @BenTheElder we have now done the adding GCP account part, but our job failed to setup k8s cluster with master KinD and master K8S, it was previously alright but seems failed to build for few weeks, I tried to debug it and solved some problems, but it still won't come up, could you help to have a look on it? I uploaded full logs in our openlab log server:
https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/728b70b/
and here is the script I used for the job:
https://github.com/theopenlab/openlab-zuul-jobs/blob/master/playbooks/kind-integration-test-arm64/run.yaml

Thanks alot

seems a problem with the internal images and containerd 🤔

Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.335923     440 remote_image.go:113] PullImage "k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284" from image service failed: rpc error: code = Unknown desc = failed to resolve image "k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284": no available registry endpoint: k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284 not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.336147     440 kuberuntime_image.go:51] Pull image "k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284" failed: rpc error: code = Unknown desc = failed to resolve image "k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284": no available registry endpoint: k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284 not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.336603     440 kuberuntime_manager.go:775] container start failed: ErrImagePull: rpc error: code = Unknown desc = failed to resolve image "k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284": no available registry endpoint: k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284 not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.336788     440 pod_workers.go:190] Error syncing pod f09c4fe814efa82a9c97906695a40f30 ("kube-scheduler-kind-kubetest-control-plane_kube-system(f09c4fe814efa82a9c97906695a40f30)"), skipping: failed to "StartContainer" for "kube-scheduler" with ErrImagePull: "rpc error: code = Unknown desc = failed to resolve image \"k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284\": no available registry endpoint: k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284 not found"
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.408287     440 kubelet.go:2248] node "kind-kubetest-control-plane" not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.429569     440 reflector.go:125] pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://172.17.0.5:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkind-kubetest-control-plane&limit=500&resourceVersion=0: dial tcp 172.17.0.5:6443: connect: connection refused
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.508929     440 kubelet.go:2248] node "kind-kubetest-control-plane" not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.609433     440 kubelet.go:2248] node "kind-kubetest-control-plane" not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.629499     440 reflector.go:125] pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://172.17.0.5:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.0.5:6443: connect: connection refused
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.711873     440 kubelet.go:2248] node "kind-kubetest-control-plane" not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.812356     440 kubelet.go:2248] node "kind-kubetest-control-plane" not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.828919     440 reflector.go:125] pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://172.17.0.5:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkind-kubetest-control-plane&limit=500&resourceVersion=0: dial tcp 172.17.0.5:6443: connect: connection refused
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.887205     440 remote_runtime.go:200] CreateContainer in sandbox "a4e678b343c741029425c2141082e3d71ec40606af2c8f8c2d191c3aa1bdaaaa" from runtime service failed: rpc error: code = Unknown desc = failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:a6e2bdd03a1214512221080f1f8f4aaf183a8b98d4a391d222aa50f34fbc20e3: not found
Jun 06 06:47:03 kind-kubetest-control-plane kubelet[440]: E0606 06:47:03.887459     440 kuberuntime_manager.go:775] container start failed: CreateContainerError: failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:a6e2bdd03a1214512221080f1f8f4aaf183a8b98d4a391d222aa50f34fbc20e3: not found

https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/728b70b/logs/kubernetes/kind-kubetest-control-plane/containerd.log

I think that node build is not creating some of the images, in my environment

kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.14.2
k8s.gcr.io/kube-controller-manager:v1.14.2
k8s.gcr.io/kube-scheduler:v1.14.2
k8s.gcr.io/kube-proxy:v1.14.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

however I can't see in the logs the kube-* images, maybe I'm missing something?

time="06:32:55" level=debug msg="Running: /usr/bin/docker [docker exec --privileged kind-build-811a5dd6-2d89-48b2-80d3-012d80828821 kubeadm config images list --kubernetes-version v1.16.0-alpha.0.868+ef7808fec57284]"
Pulling: k8s.gcr.io/pause:3.1
time="06:32:55" level=info msg="Pulling image: k8s.gcr.io/pause:3.1 ..."
time="06:32:55" level=debug msg="Running: /usr/bin/docker [docker pull k8s.gcr.io/pause:3.1]"
time="06:33:27" level=debug msg="Running: /usr/bin/docker [docker save -o /tmp/kind-node-image717940973/bits/images/4.tar k8s.gcr.io/pause:3.1]"
Pulling: k8s.gcr.io/etcd:3.3.10
time="06:33:27" level=info msg="Pulling image: k8s.gcr.io/etcd:3.3.10 ..."
time="06:33:27" level=debug msg="Running: /usr/bin/docker [docker pull k8s.gcr.io/etcd:3.3.10]"
time="06:33:49" level=debug msg="Running: /usr/bin/docker [docker save -o /tmp/kind-node-image717940973/bits/images/5.tar k8s.gcr.io/etcd:3.3.10]"
Pulling: k8s.gcr.io/coredns:1.3.1
time="06:34:24" level=info msg="Pulling image: k8s.gcr.io/coredns:1.3.1 ..."
time="06:34:24" level=debug msg="Running: /usr/bin/docker [docker pull k8s.gcr.io/coredns:1.3.1]"
time="06:34:28" level=debug msg="Running: /usr/bin/docker [docker save -o /tmp/kind-node-image717940973/bits/images/6.tar k8s.gcr.io/coredns:1.3.1]"
Pulling: kindest/kindnetd:0.1.0
time="06:34:47" level=info msg="Pulling image: kindest/kindnetd:0.1.0 ..."
time="06:34:47" level=debug msg="Running: /usr/bin/docker [docker pull kindest/kindnetd:0.1.0]"
time="06:34:52" level=debug msg="Running: /usr/bin/docker [docker save -o /tmp/kind-node-image717940973/bits/images/7.tar kindest/kindnetd:0.1.0]"
Pulling: k8s.gcr.io/ip-masq-agent:v2.4.1
time="06:34:53" level=info msg="Pulling image: k8s.gcr.io/ip-masq-agent:v2.4.1 ..."
time="06:34:53" level=debug msg="Running: /usr/bin/docker [docker pull k8s.gcr.io/ip-masq-agent:v2.4.1]"
time="06:35:01" level=debug msg="Running: /usr/bin/docker [docker save -o /tmp/kind-node-image717940973/bits/images/8.tar k8s.gcr.io/ip-masq-agent:v2.4.1]"
time="06:35:14" level=debug msg="Running: /usr/bin/docker [docker exec --privileged kind-build-811a5dd6-2d89-48b2-80d3-012d80828821 mkdir -p /kind/images]"
time="06:35:14" level=debug msg="Running: /usr/bin/docker [docker exec --privileged kind-build-811a5dd6-2d89-48b2-80d3-012d80828821 mv /build/bits/images/4.tar /build/bits/images/5.tar /build/bits/images/6.tar /build/bits/images/7.tar /build/bits/images/8.tar /kind/images]"

@ZhengZhenyu seems that the last job was successful https://logs.openlabtesting.org/builds?project=kubernetes-sigs/kind, am I looking into the right place?

@aojea Thanks alot for the reply, I guess there could be some misconfiguration for the job workflow so that it shows the process is sucess but actually no test is running, I will check it latter. But yes, the K8S cluster did not come up for a few weeks, I've tried alot of combinations of kind and k8s, they didn't work, we used to have a successfully runing test workflow for about a month but we have recently changed our OpenLab deployment so I cannot provide the previous running logs.

@aojea As you can see from the test scripts, I copied all files from _artifacts folder, I guess the kube-* log should be in their if they were generated?

And see from:
https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/728b70b/job-output.txt.gz

2019-06-06 06:36:33.026854 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:33.024811073Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:9730ba13d51885429e8a0544daef7ebb527fbaa8387b8b6f8458029f5ebac9de,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:33.028068 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:33.026746014Z" level=info msg="ImageUpdate event &ImageUpdate{Name:docker.io/kindest/kindnetd:0.1.0,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:35.791139 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:35.789568652Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/coredns:1.3.1,Labels:map[string]string{},}"
2019-06-06 06:36:38.405538 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:38.403412019Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:7e8edeee9a1e73cdd4a1209eaa12aee15933456c7b6c0eb7d6758c8e1a078d0a,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:38.408607 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:38.407053352Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/ip-masq-agent:v2.4.1,Labels:map[string]string{},}"
2019-06-06 06:36:39.297464 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.295437592Z" level=error msg="(*service).Write failed" error="rpc error: code = Unavailable desc = ref k8s.io/1/tar-repositories locked: unavailable" ref=tar-repositories total=137
2019-06-06 06:36:39.436158 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.434494073Z" level=info msg="ImageUpdate event &ImageUpdate{Name:k8s.gcr.io/coredns:1.3.1,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.438426 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.436674882Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:62e7d8e75a3fe2e9097d3c9fde8a5d22593b60db56e2e7584a5000c00f1815a7,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.440870 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.439220695Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284,Labels:map[string]string{},}"
2019-06-06 06:36:39.910683 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.909068338Z" level=info msg="ImageUpdate event &ImageUpdate{Name:k8s.gcr.io/ip-masq-agent:v2.4.1,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.912988 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.911339351Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/kube-controller-manager:v1.16.0-alpha.0.868_ef7808fec57284,Labels:map[string]string{},}"
2019-06-06 06:36:39.989654 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.987856531Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:e13edde249eb7dea3d2718a3d5b1209580a11351c9e3bbeacda4dcb8210b52a9,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.991741 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.989892306Z" level=info msg="ImageUpdate event &ImageUpdate{Name:k8s.gcr.io/kube-scheduler:v1.16.0-alpha.0.868_ef7808fec57284,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.993624 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.991824576Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:a6e2bdd03a1214512221080f1f8f4aaf183a8b98d4a391d222aa50f34fbc20e3,Labels:map[string]string{io.cri-containerd.image: managed,},}"
2019-06-06 06:36:39.995324 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:39.993830849Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/kube-proxy:v1.16.0-alpha.0.868_ef7808fec57284,Labels:map[string]string{},}"
2019-06-06 06:36:40.033304 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:40.032145947Z" level=info msg="Stop CRI service"
2019-06-06 06:36:40.033913 | ubuntu-xenial-arm64 | time="2019-06-06T06:36:40.032630064Z" level=info msg="Stop CRI service"

Seems some of the kube-* service is missing, and seems kube-proxy and kube-scheduler didn't show managed in the log.

@ZhengZhenyu how can I check if this works? https://github.com/theopenlab/openlab-zuul-jobs/pull/550
I think that the best thing moving forward is mimic the prow jobs

FWIW the snippet in https://github.com/kubernetes-sigs/kind/issues/188#issuecomment-499420602 looks like build output, which will have some scary looking errors that can be ignore currently, if kind build node-image exits 0, it should be fine.

@BenTheElder yeah it seems the build operation is OK, but if we only test build, how are we suppose to add results to testgrid?

https://github.com/kubernetes/test-infra/pull/13273/ has been merged so the CI itself is done

@aojea Sorry for the delay, checking

@aojea the error seems to be:

INFO: Call stack for the definition of repository 'containerregistry' which is a http_archive (rule definition at /root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/bazel_tools/tools/build_defs/repo/http.bzl:237:16):
2019-08-05 18:15:47.172462 | ubuntu-xenial-arm64 | - /root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/io_bazel_rules_docker/repositories/repositories.bzl:78:9
2019-08-05 18:15:47.172814 | ubuntu-xenial-arm64 | - /home/zuul/src/k8s.io/kubernetes/WORKSPACE:48:1
2019-08-05 18:15:47.195743 | ubuntu-xenial-arm64 | ERROR: An error occurred during the fetch of repository 'containerregistry':
2019-08-05 18:15:47.197022 | ubuntu-xenial-arm64 | java.io.IOException: Error downloading [https://github.com/google/containerregistry/archive/v0.0.34.tar.gz] to /root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/containerregistry/v0.0.34.tar.gz: connect timed out
2019-08-05 18:15:47.251361 | ubuntu-xenial-arm64 | ERROR: /home/zuul/src/k8s.io/kubernetes/build/BUILD:63:2: //build:kube-scheduler-internal depends on @containerregistry//:digester in repository @containerregistry which failed to fetch. no such package '@containerregistry//': java.io.IOException: Error downloading [https://github.com/google/containerregistry/archive/v0.0.34.tar.gz] to /root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/containerregistry/v0.0.34.tar.gz: connect timed out
2019-08-05 18:15:47.330461 | ubuntu-xenial-arm64 | ERROR: Analysis of target '//build:docker-artifacts' failed; build aborted: no such package '@containerregistry//': java.io.IOException: Error downloading [https://github.com/google/containerregistry/archive/v0.0.34.tar.gz] to /root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/containerregistry/v0.0.34.tar.gz: connect timed out

do you think this is related to k8s branch? or is it related to bazel itself?

:thinking: I can download this file https://github.com/google/containerregistry/archive/v0.0.34.tar.gz, however the message says the connect timed out

@ZhengZhenyu the logs are showing a lot of errors related to bazel issues and I know that upstream is having a hard time everytime that bazel release a new version. https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/d1a63fc/job-output.txt.gz

2019-08-06 06:19:38.429994 | ubuntu-xenial-arm64 | sequence element must be a string (got 'bool'). See https://github.com/bazelbuild/bazel/issues/7802 for information about --incompatible_string_join_requires_strings.

2019-08-06 06:19:39.889926 | ubuntu-xenial-arm64 | Incompatible flag --incompatible_require_ctx_in_configure_features has been flipped, and the mandatory parameter 'ctx' of cc_common.configure_features is missing. Please add 'ctx' as a named parameter. See https://github.com/bazelbuild/bazel/issues/7793 for details.

@ZhengZhenyu I can`t check what bazel version is used for the jobs, is it pinned to a specific version?
@BenTheElder you are the bazel expert, what'd the recommended version of bazel?
Is there a relation between bazel release and kubernetes release?

@aojea yes, the bazel version is now pinned to 0.28.1 and it was pinned to 0.23, we upgraded it few weeks ago because envoy project required. could that be the problem?

yeah, definitively, can we try with another version only for that job? I guess it has to be <= 0.26

@aojea I will have to discuss with the team to see if we can have a unique worker for kind

@ZhengZhenyu we can install our own version of bazel for the kind job https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu , no need to have a unique worker

The --user flag installs Bazel to the $HOME/bin directory on your system and sets the .bazelrc path to $HOME/.bazelrc. Use the --help command to see additional installation options.

EDIT

er, seems we have to compile it from source, anyway, point is we can use our own version for this job :sweat_smile: , what do you think?
https://docs.bazel.build/versions/master/install-compile-source.html#compiling-from-source

@aojea yes, there is no arm64 release so we have to build from source which takes hours, so we pre-built it and put it in the image of the worker, I've checked with my team and maybe the old version is still there and I could just simply copy it to path to overwrite the newer version the first step in our kind job.

@aojea Oops, checking from the server, the old version got deleted, so I have to rebuild it, this might take few hours, so I guess it could not be done during my office hours today.

@ZhengZhenyu Please, check if this version works for you, this is the version I´m using on my local tests https://www.dropbox.com/s/0gygwih4256974k/bazel-0.26.1-aarch64?dl=0

Build label: 0.26.1- (@non-git)
Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar

```sh
$ file /users/aojea/bazel/output/bazel
/users/aojea/bazel/output/bazel: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-, for GNU/Linux 3.7.0, BuildID[sha1]=a424820c1608c335533704ac6cecfef062005a34, not stripped

```sh
$ md5sum /users/aojea/bazel/output/bazel
73050b4be346f4086ba6f05b5a0c62a1  /users/aojea/bazel/output/bazel

@aojea Hmm, are you sure? 0.26 seems not ok + kind build node-image --base-image kindest/base:latest --type=bazel --kube-root=/home/zuul/src/k8s.io/kubernetes
2019-08-07 03:04:48.421646 | ubuntu-xenial-arm64 | Starting local Bazel server and connecting to it...
2019-08-07 03:04:58.516265 | ubuntu-xenial-arm64 | Loading:
2019-08-07 03:04:58.527968 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:04:59.535464 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:00.545164 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:02.538017 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:03.539170 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:04.539531 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:05.579609 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:06.849602 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:06.850068 | ubuntu-xenial-arm64 | currently loading: build ... (4 packages)
2019-08-07 03:05:07.866957 | ubuntu-xenial-arm64 | Analyzing: 4 targets (4 packages loaded, 0 targets configured)
2019-08-07 03:05:09.435783 | ubuntu-xenial-arm64 | Analyzing: 4 targets (17 packages loaded, 31 targets configured)
2019-08-07 03:05:11.236828 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:13.445792 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:15.845550 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:18.655987 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:28.225680 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:32.819114 | ubuntu-xenial-arm64 | Analyzing: 4 targets (176 packages loaded, 2572 targets configured)
2019-08-07 03:05:34.893333 | ubuntu-xenial-arm64 | INFO: SHA256 (https://codeload.github.com/golang/tools/zip/bf090417da8b6150dcfe96795325f5aa78fff718) = 11629171a39a1cb4d426760005be6f7cb9b4182e4cb2756b7f1c5c2b6ae869fe
2019-08-07 03:05:34.980855 | ubuntu-xenial-arm64 | DEBUG: Rule 'debian-iptables-arm64' indicated that a canonical reproducible form can be obtained by modifying arguments digest = "sha256:1a63fdd216fe7b84561d40ab1ebaa0daae1fc73e4232a6caffbd8353d9a14cea"
2019-08-07 03:05:35.093497 | ubuntu-xenial-arm64 | DEBUG: Rule 'debian-base-arm64' indicated that a canonical reproducible form can be obtained by modifying arguments digest = "sha256:17be039c7035bd0897d954c51914ad41cd7e2b0b7c170b3d89ed021833df2fb1"
2019-08-07 03:05:38.109812 | ubuntu-xenial-arm64 | Analyzing: 4 targets (825 packages loaded, 8718 targets configured)
2019-08-07 03:05:40.007761 | ubuntu-xenial-arm64 | DEBUG: Rule 'org_golang_x_tools' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "11629171a39a1cb4d426760005be6f7cb9b4182e4cb2756b7f1c5c2b6ae869fe"
2019-08-07 03:05:44.975087 | ubuntu-xenial-arm64 | Analyzing: 4 targets (1619 packages loaded, 13612 targets configured)
2019-08-07 03:05:50.481029 | ubuntu-xenial-arm64 | ERROR: /home/zuul/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/sets/BUILD:25:1: in _go_genrule rule //staging/src/k8s.io/apimachinery/pkg/util/sets:set-gen:
2019-08-07 03:05:50.481413 | ubuntu-xenial-arm64 | Traceback (most recent call last):
2019-08-07 03:05:50.482013 | ubuntu-xenial-arm64 | File "/home/zuul/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/sets/BUILD", line 25
2019-08-07 03:05:50.482273 | ubuntu-xenial-arm64 | _go_genrule(name = 'set-gen')
2019-08-07 03:05:50.483085 | ubuntu-xenial-arm64 | File "/root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/io_k8s_repo_infra/defs/go.bzl", line 37, in _go_genrule_impl
2019-08-07 03:05:50.483316 | ubuntu-xenial-arm64 | all_srcs += dep.files
2019-08-07 03:05:50.484437 | ubuntu-xenial-arm64 | + operator on a depset is forbidden. See https://docs.bazel.build/versions/master/skylark/depsets.html for recommendations. Use --incompatible_depset_union=false to temporarily disable this check.
2019-08-07 03:05:50.782848 | ubuntu-xenial-arm64 | ERROR: Analysis of target '//build:docker-artifacts' failed; build aborted: Analysis of target '//staging/src/k8s.io/apimachinery/pkg/util/sets:set-gen' failed; build aborted
2019-08-07 03:05:50.828674 | ubuntu-xenial-arm64 | INFO: Elapsed time: 62.451s
2019-08-07 03:05:50.829031 | ubuntu-xenial-arm64 | INFO: 0 processes.
2019-08-07 03:05:50.836660 | ubuntu-xenial-arm64 | FAILED: Build did NOT complete successfully (1979 packages loaded, 17048 targets configured)
2019-08-07 03:05:50.845262 | ubuntu-xenial-arm64 | FAILED: Build did NOT complete successfully (1979 packages loaded, 17048 targets configured)
2019-08-07 03:05:50.857241 | ubuntu-xenial-arm64 | time="03:05:50" level=error msg="Failed to build Kubernetes: exit status 1"
2019-08-07 03:05:50.857738 | ubuntu-xenial-arm64 | Error: error building node image: failed to build kubernetes: exit status 1

I'm building again for 0.24

@aojea Hi, sorry for the delay, I've tried serveral versions, and finally rolled back to 0.23.2 and manually tested(no log will be updated), you should be able to see the results after next periodic run.

@ZhengZhenyu you did it, now is building the cluster.
However, seems that the testgrid config has changed and I can´t find the dashboard to check the errors, will try to check tomorrow

@ZhengZhenyu the e2e tests are running but is failing to upload the results because the script seems to need python > 3.6 but the node has python 3.5

2019-08-12 20:36:02.654884 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.5/subprocess.py", line 693, in run
2019-08-12 20:36:02.655245 | ubuntu-xenial-arm64 |     with Popen(*popenargs, **kwargs) as process:
2019-08-12 20:36:02.655688 | ubuntu-xenial-arm64 | TypeError: __init__() got an unexpected keyword argument 'encoding'

The encoding argument is not present in python 3.5

Changed in version 3.6: Added encoding and errors parameters

Is it possible to use python > 3.6?

@aojea sure, I will try

we hit another problem , seems the account is not longer valid

2019-08-13 20:35:25.925443 | ubuntu-xenial-arm64 | WARNING: [kind-arm64-openlab-logs@k8s-federated-conformance.iam.gserviceaccount.com] appears to be a service account. Service account tokens cannot be revoked, but they will expire automatically. To prevent use of the service account token earlier than the expiration, revoke the parent service account or service account key.
2019-08-13 20:35:25.930826 | ubuntu-xenial-arm64 | Revoked credentials:
2019-08-13 20:35:25.931313 | ubuntu-xenial-arm64 |  - kind-arm64-openlab-logs@k8s-federated-conformance.iam.gserviceaccount.com
2019-08-13 20:35:26.083693 | ubuntu-xenial-arm64 | Traceback (most recent call last):
2019-08-13 20:35:26.084662 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 328, in <module>
2019-08-13 20:35:26.085125 | ubuntu-xenial-arm64 |     main(sys.argv[1:])
2019-08-13 20:35:26.085801 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 318, in main
2019-08-13 20:35:26.086858 | ubuntu-xenial-arm64 |     upload_string(gcs_dir+'/started.json', started_json, args.dry_run)
2019-08-13 20:35:26.087320 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 175, in upload_string
2019-08-13 20:35:26.087583 | ubuntu-xenial-arm64 |     proc.communicate(input=text)
2019-08-13 20:35:26.088014 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.6/subprocess.py", line 848, in communicate
2019-08-13 20:35:26.088260 | ubuntu-xenial-arm64 |     self._stdin_write(input)
2019-08-13 20:35:26.088730 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.6/subprocess.py", line 801, in _stdin_write
2019-08-13 20:35:26.088972 | ubuntu-xenial-arm64 |     self.stdin.write(input)
2019-08-13 20:35:26.089336 | ubuntu-xenial-arm64 | TypeError: a bytes-like object is required, not 'str'
2019-08-13 20:35:26.089645 | ubuntu-xenial-arm64 | Run: ['gcloud', 'auth', 'revoke']

@dims @ZhengZhenyu do you have an idea on what can be the problem with the service account? ^^

These are the logs https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/230250b/

@aojea Hi, sorry for the delay, we also had the similar problem in cloud-provider-openstack job, and my colleague checked yesterday, it turns out it is a wrong use of subprocess, the stdin should probably be deleted.

@aojea Hi, seems the job successed once in 8.20 and I can see both results from testgrid and openlab:
https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/4631b06/
https://k8s-testgrid.appspot.com/conformance-kind#kind,%20v1.14%20(dev,%20ARM64)
And the tests seems actually running for the first time

But then the job starts to fail again.

@aojea Hmm, seems there is something wrong setting up the env again and the tests did not run and thus nothing can be uploaded.

@ZhengZhenyu the e2e.sh script changed at that time but don't know if one of those changes broke the openlab CI. Seems that the containers for the kubernetes components are not able to spawn, i.e the kubelet fails and there are no logs for the kubeapi-server , ....

https://github.com/kubernetes-sigs/kind/commit/97b044c5d1fe869af8d9e9052d1c218f590fd3c4#diff-d9fa0450190d60ba133fb92282a94725

I've sent a PR to try to align the CI job with the new changed on the e2e.sh, and we can iterate from m there.

https://github.com/theopenlab/openlab-zuul-jobs/pull/625

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings