Cannot run "oc cluster up --metrics" successfully. It always fails.
$ ./oc version
oc v1.4.0-alpha.1+f189ede
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO
$ sudo ./oc cluster up --metrics
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.0-alpha.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ...
WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ...
Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using 192.168.1.2 as the server IP
-- Starting OpenShift container ...
Creating initial OpenShift configuration
Starting OpenShift using container 'origin'
Waiting for API server to start listening
OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... OK
-- Installing router ... OK
-- Installing metrics ... FAIL
Error: cannot create metrics deployer pod
Details:
Last 10 lines of "origin" container log:
I1117 00:38:44.353250 12675 trace.go:61] Trace "Update
/api/v1/namespaces/openshift-infra/serviceaccounts/deployment-controller" (started 2016-11-17
00:38:43.74571921 +0000 UTC):
[22.617碌s] [22.617碌s] About to convert to expected version
[93.692碌s] [71.075碌s] Conversion done
[99.156碌s] [5.464碌s] About to store object in database
[607.415218ms] [607.316062ms] Object stored in database
[607.425586ms] [10.368碌s] Self-link added
[607.484338ms] [58.752碌s] END
I1117 00:38:44.353911 12675 trace.go:61] Trace "Delete
/api/v1/namespaces/openshift-infra/secrets/namespace-controller-token-fwytu" (started 2016-11-17
00:38:42.796986729 +0000 UTC):
[30.765碌s] [30.765碌s] About do delete object from database
[1.556883322s] [1.556852557s] END
Caused By:
Error: No API token found for service account "metrics-deployer", retry after the token is
automatically created and added to the service account
Successful install of OpenShift with metrics.
I originally did not have my user in the docker group, which is why I prefix my command with "sudo". However, I did try this by putting my user in the docker group, and it didn't help. Same problem occurs. So I do not think it is related, but here's what I did:
$ sudo groupadd docker && sudo gpasswd -a ${USER} docker && sudo systemctl restart docker && newgrp docker
Also, I build "oc" from current master branch, and I get the same problem.
@pweil- I don't believe this has anything to do with Origin Metrics directly. Setting up the service account is done as part of the cluster up command.
If @jmazzitelli installs origin metrics directly then it works for him.
And if I follow the steps outlined in what he is doing, then it works properly for me.
I just want to be clear about the replication procedures, I get this failure by doing nothing special other than download the oc binary, untar it, and run it via sudo:
$ wget https://github.com/openshift/origin/releases/download/v1.4.0-alpha.1/openshift-origin-client-tools-v1.4.0-alpha.1.f189ede-linux-64bit.tar.gz
$ tar xvfz openshift-origin-client-tools-v1.4.0-alpha.1.f189ede-linux-64bit.tar.gz
$ cd openshift-origin-client-tools-v1.4.0-alpha.1+f189ede-linux-64bit/
$ sudo ./oc cluster up --metrics
For the record, I am on Fedora 23, with "uname -a" as follows:
$ uname -a
Linux mazztower 4.6.7-200.fc23.x86_64 #1 SMP Wed Aug 17 14:24:53 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
if we are synchronously creating a service account then immediately creating a pod that uses it, that code needs to be able to retry creating the pod if it is forbidden because the service account's token hasn't been auto-generated yet.
@liggitt thanks, would changing it to a Job do the trick?
@soltysh could confirm, but I would expect so
I'm quickly going to see if this works. Not sure if you'd want a PR with this (this changes the code so it just keeps retrying if the error it gets is this "retry after the token is ready" error message. I suspect this is going to fix it (if this truly is a case where retrying will help - I will see in a minute).
- deployerPod := metricsDeployerPod(hostName, imagePrefix, imageVersion)
- if _, err = kubeClient.Pods(infraNamespace).Create(deployerPod); err != nil {
- return errors.NewError("cannot create metrics deployer pod").WithCause(err).WithDetails(h.OriginLog())
+ for keepTrying := true; keepTrying == true; {
+ deployerPod := metricsDeployerPod(hostName, imagePrefix, imageVersion)
+ if _, err = kubeClient.Pods(infraNamespace).Create(deployerPod); err != nil {
+ if !strings.Contains(err.Error(), "retry after the token") {
+ return errors.NewError("cannot create metrics deployer pod").WithCause(err).WithDetails(h.OriginLog())
+ }
+ } else {
+ keepTrying = false
+ }
That fix works. My "cluster up" command finished successfully and I do see this in the output:
-- Installing metrics ... OK
@jmazzitelli thanks for confirming that's the problem. I'd rather simply instantiate a job so that the job controller can do the retry for us.
@csrwng sounds good to me. thanks for looking into this.
Submitted pull request #12174 for review
Closed via #12174
Most helpful comment
@jmazzitelli thanks for confirming that's the problem. I'd rather simply instantiate a job so that the job controller can do the retry for us.