I recently added a scheduler-benchmark job which basically runs make test-integration ... inside a dockerized environment. The job runs this script using the kubernetes_verify scenario. That script is quite similar to the test-dockerized.sh script which is used to run our normal integration tests - while this one is used to run benchmark integration tests.
When I locally run the make command, I get the desired output (with some lines of benchmark results in b/w):
$ make test-integration WHAT="./test/integration/scheduler_perf" KUBE_TEST_ARGS="-run='XXX' -bench=. -benchmem"
+++ [0123 20:46:51] Checking etcd is on PATH
/usr/local/google/home/shyamjvs/Downloads/etcd-v3.1.11-linux-amd64/etcd
+++ [0123 20:46:51] Starting etcd instance
etcd --advertise-client-urls http://127.0.0.1:2379 --data-dir /tmp/tmp.SzP8ZqrfTO --listen-client-urls http://127.0.0.1:2379 --debug > "/dev/null" 2>/dev/null
Waiting for etcd to come up.
+++ [0123 20:46:52] On try 2, etcd: : http://127.0.0.1:2379
{"action":"set","node":{"key":"/_test","value":"","modifiedIndex":4,"createdIndex":4}}
+++ [0123 20:46:52] Running integration test cases
Running tests for APIVersion: v1
+++ [0123 20:46:54] Running tests without code coverage
W0123 20:47:11.417751 90561 admission.go:61] AlwaysAdmit admission controller is deprecated. Please remove this controller from your configuration files and scripts.
I0123 20:47:11.418413 90561 services.go:35] Network range for service cluster IPs is unspecified. Defaulting to 10.0.0.0/24.
I0123 20:47:11.418442 90561 master.go:269] Node port range unspecified. Defaulting to 30000-32767.
...
...
BenchmarkScheduling/100Nodes/0Pods-12 500 2393175 ns/op 431003 B/op 4825 allocs/op
...
...
BenchmarkScheduling/100Nodes/1000Pods-12 500 2164436 ns/op 459811 B/op 5366 allocs/op
...
...
All those lines are also present in the CI job's build-log.txt (e.g - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-benchmark-scheduler-master/71/build-log.txt), except those lines starting with Benchmark which are basically results of the go benchmark.
cc @krzyzacy @kubernetes/test-infra-maintainers - Does someone know what's happening with those lines?
cc @krzyzacy
also FWIW test-dockerized.sh is a legacy hack :(
I'll suspect something goes wonky during our fancy d-in-d-in-d transition? :joy:
cc @ixdy if Jeff has seen something like this before
from your log output this is mucking with godep and other things for no reason at the beginning I suspect, this should almost definitely just be a normal test job running on kubekins e2e image without dind.
I'll suspect something goes wonky during our fancy d-in-d-in-d transition?
The question is what is so special with these benchmark lines that it happens only for them - when there are other lines (for e.g Waiting for etcd to come up.) that are also printed to stdout, but they do appear in the logs.
this should almost definitely just be a normal test job running on kubekins e2e image without dind
I'm not sure if kubekins e2e would be the right image, as this job doesn't run any e2e test.. It just runs some integration tests, like for e.g pull-kubernetes-verify (which also uses kubernetes_verify scenario).
Even otherwise, I don't think kubernetes_e2e is the right scenario to be used as it calls the kubekins binary which is meant for the e2e setup (i.e bring up cluster, run e2e tests, etc..). Functionally, this job is quite different. Here we're just running integration tests (i.e everything running locally within the container as some binary - nothing really to interact with externally).
I'm not sure if kubekins e2e would be the right image, as this job doesn't run any e2e test.. It just runs some integration tests, like for e.g pull-kubernetes-verify (which also uses kubernetes_verify scenario).
The kubekins-e2e image is the catch-all image for tests that need versioned go, bazel, kubetest, gcloud etc.
pull-kubernetes-verify is the worst presubmit we have currently (very slow, buggy, complex to maintain) and only runs the way it does because this job is blocking and still needs to work identically post-Jenkins. We have a TODO somewhere to delete the current kubernetes_verify.py after we figure out how to best solve this :(
Running this way does docker-on-docker in docker-in-docker just to run in a certain image and be able to do docker builds. There's no doubt that this is buggy. I suspect we see some issues with the layers of log buffering going on here.
Even otherwise, I don't think kubernetes_e2e is the right scenario to be used as it calls the kubekins binary which is meant for the e2e setup (i.e bring up cluster, run e2e tests, etc..). Functionally, this job is quite different. Here we're just running integration tests (i.e everything running locally within the container as some binary - nothing really to interact with externally).
Integration tests probably just need the execute scenario, images and scenarios aren't actually very related. kubekins-e2e is a bad name, it is just an image that has all the dependencies for e2es and building. The image doesn't run e2e's itself, that's handled by bootstrap / the scenario.
I see. Then my question reduces to the following - how do I pass a --script parameter like this to kubernetes_e2e?
Use execute.py instead and then use an image like gcloud-in-go or kubekins-e2e.
That is, unless somehow this is actually using docker. I suspect we just need to add versioned etcd @krzyzacy
We're discussing how to solve this today, we've been putting off solving the verify/unit jobs but I don't want those work arounds to become permanent.
I can try to re-configure the job and see if it will work better
Ok.. let's try that.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-benchmark-scheduler-master/72
hummmmm @shyamjvs can you tell where's the failure? I saw
ok k8s.io/kubernetes/test/integration/scheduler_perf 187.646s though
I believe it's this:
./hack/jenkins/benchmark-dockerized.sh: line 54: /workspace/artifacts/BenchmarkResults.txt: No such file or directory
which is coming from this line of code in the script: https://github.com/kubernetes/kubernetes/blob/master/hack/jenkins/benchmark-dockerized.sh#L54
This didn't happen before and it's quite surprising why it just didn't create that file to write the output.
I've sent out the above PR to fix the non-existing file issue. Hope it helps.
W0124 17:25:37.647] + touch /workspace/artifacts/BenchmarkResults.txt
W0124 17:25:37.648] touch: cannot touch '/workspace/artifacts/BenchmarkResults.txt': No such file or directory
hummm maybe '/workspace/artifacts` did not exist yet?
Yup.. that seems to be the issue. Will send out a fix asap.
Ok, the above PR fixed those failures. However, the original issue (i.e missing lines of benchmark output to stdout) still remains.
And in fact those are the lines most useful from the job.
@krzyzacy @BenTheElder - Any leads? Maybe sth going wrong with the stdout stream in the container?
@shyamjvs can you point us where are those lines get printed during the integration test?
They are printed by go when it runs the test.. i.e. as part of go test which is called inside the target make test-integration ... in the benchmark script.
Maybe there's a way to reproduce this locally? Any leads on how to run the job in the kubekins_e2e container locally?
the container looks like
containers:
- image: gcr.io/k8s-testimages/kubekins-e2e:v20180123-1260ba9fc-master
args:
- --repo=k8s.io/kubernetes=master
- --timeout=40
- --root=/go/src
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/service-account/service-account.json
- name: GOPATH
value: /go
volumeMounts:
- name: service
mountPath: /etc/service-account
you'll need to mount a service account
@shyamjvs what is printing out the BenchmarkScheduling/ lines? are they being printed to stderr instead of stdout?
No, they're being printed to stdout (checked it by redirecting stderr >
/dev/null).
On Mon, Jan 29, 2018, 1:15 AM Jeff Grafton notifications@github.com wrote:
@shyamjvs https://github.com/shyamjvs what is printing out the
BenchmarkScheduling/ lines? are they being printed to stderr instead of
stdout?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/6392#issuecomment-361109899,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEIhk7V4O2glkpp-NS_43RJl0NWOolwaks5tPQ2KgaJpZM4RqOOC
.
do you know what code is printing them out? maybe there's some verbosity setting somewhere that's being set to something different?
I've spent some time yesterday trying to reproduce this issue locally (within a container) and finally managed to do it by:
docker run -it \
-e JOB_NAME=ci-benchmark-scheduler-master \
-e GOOGLE_APPLICATION_CREDENTIALS=/etc/service-account/service-account.json \
-v /etc/service-account:/etc/service-account \
gcr.io/k8s-testimages/kubekins-e2e:v20180123-1260ba9fc-master \
--repo=k8s.io/kubernetes=master \
--timeout=40 \
--root=/go/src
Before running the above you need to ensure that you have the kubekins GCP service-account stored locally on your machine at /etc/service-account/service-account.json.
It turns out that those benchmark lines are not printed as part of the docker run command's output, but if I exec into the running container and run the make test-integration ... command on bash, those lines get printed.
While we were digging it up, @porridge identified the reason why this was happening. The stdout of the go test command which was being called under the hoods by the make command, was being filtered based on a regexp. This regexp was turned on due to setting the KUBE_JUNIT_REPORT_DIR var inside our script.
I'll send out a fix soon to disable it.
I also managed to verify that the fix works by running it against my PR using:
docker run -it \
-e JOB_NAME=ci-benchmark-scheduler-master \
-e GOOGLE_APPLICATION_CREDENTIALS=/etc/service-account/service-account.json \
-v /etc/service-account:/etc/service-account \
gcr.io/k8s-testimages/kubekins-e2e:v20180123-1260ba9fc-master \
--repo=k8s.io/kubernetes=master:65c0d55bb,59068:b12a6e367 \
--timeout=40 \
--root=/go/src
and the output had:
I0131 13:22:56.550] goos: linux
I0131 13:22:56.550] goarch: amd64
I0131 13:22:56.550] pkg: k8s.io/kubernetes/test/integration/scheduler_perf
I0131 13:22:56.550] BenchmarkScheduling/100Nodes/0Pods-12 500 2038534 ns/op 431378 B/op 4826 allocs/op
I0131 13:22:56.551] BenchmarkScheduling/100Nodes/1000Pods-12 500 2279950 ns/op 466114 B/op 5451 allocs/op
I0131 13:22:56.551] BenchmarkScheduling/1000Nodes/0Pods-12 100 26056228 ns/op 2390752 B/op 27403 allocs/op
I0131 13:22:56.551] BenchmarkScheduling/1000Nodes/1000Pods-12 100 27349647 ns/op 2676577 B/op 32578 allocs/op
I0131 13:22:56.551] BenchmarkSchedulingAntiAffinity/500Nodes/250Pods-12 250 57442820 ns/op 3572412 B/op 28158 allocs/op
I0131 13:22:56.551] BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-12 250 173754062 ns/op 35009722 B/op 205868 allocs/op
I0131 13:22:56.551] PASS
I0131 13:22:56.551] ok k8s.io/kubernetes/test/integration/scheduler_perf 140.174s
@krzyzacy @BenTheElder While the above would fix the issue with those missing lines, there's one more piece that we need here.
The pretty-printed results for the benchmark are written to WORKSPACE/artifacts/, but the dir is not being uploaded to GCS (for e.g - http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-benchmark-scheduler-master/249/).
Could you fix the execute scenario (or maybe bootstrap.py) to do that?
it might work if you rename the artifacts directory to _artifacts. I think only the unit/verify scenarios use artifacts, and only inside the test container (it's mapped to _artifacts outside).
I included that change in my above PR. Will see if that works, thanks.
@ixdy good catch :joy: one day we will maybe want to unify everything :man_shrugging:
Just checked that my fix works, and it indeed seems so - http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-benchmark-scheduler-master/297/
Thanks guys!