Origin: oc cluster up failed at the 2nd time

Created on 11 Sep 2018 · 14Comments · Source: openshift/origin

[provide a description of the issue]

Version

oc v3.10.0+dd10d17
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Steps To Reproduce

[step 1]
the 1st time run oc cluster up, it succeeded with such information:
Login to server ...
Creating initial project "myproject" ...
Server Information ...
OpenShift server started.

The server is accessible via web console at:
https://127.0.0.1:8443

[step 2]
I just run oc cluster down, and then run oc cluster up, it fails giving the followings:

Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.10 ...
I0911 14:49:01.583893 13673 flags.go:30] Running "create-kubelet-flags"
I0911 14:49:02.557376 13673 run_kubelet.go:48] Running "start-kubelet"
I0911 14:49:03.119012 13673 run_self_hosted.go:172] Waiting for the kube-apiserver to be ready ...
E0911 14:54:03.121936 13673 run_self_hosted.go:542] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition

run oc adm diagnostics:

[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at '/root/.kube/config'

ERROR: [CED1008 from controller openshift/origin/pkg/oc/admin/diagnostics/cluster.go]
Unknown error testing cluster-admin access for context 'myproject/127-0-0-1:8443/developer':
Post https://127.0.0.1:8443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 127.0.0.1:8443: getsockopt: connection refused

ERROR: [CED1008 from controller openshift/origin/pkg/oc/admin/diagnostics/cluster.go]
Unknown error testing cluster-admin access for context 'default/127-0-0-1:8443/system:admin':
Post https://127.0.0.1:8443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 127.0.0.1:8443: getsockopt: connection refused

[Note] Could not configure a client with cluster-admin permissions for the current server, so cluster diagnostics will be skipped

[Note] Running diagnostic: ConfigContexts[/127-0-0-1:8443/developer]
Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/config_contexts.go:299]
For client config context '/127-0-0-1:8443/developer':
The server URL is 'https://127.0.0.1:8443'
The user authentication is 'developer/127-0-0-1:8443'
The current project is 'default'
(*url.Error) Get https://127.0.0.1:8443/apis/project.openshift.io/v1/projects: dial tcp 127.0.0.1:8443: getsockopt: connection refused
Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

[Note] Running diagnostic: ConfigContexts[default/127-0-0-1:8443/system:admin]
Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/config_contexts.go:299]
For client config context 'default/127-0-0-1:8443/system:admin':
The server URL is 'https://127.0.0.1:8443'
The user authentication is 'system:admin/127-0-0-1:8443'
The current project is 'default'
(*url.Error) Get https://127.0.0.1:8443/apis/project.openshift.io/v1/projects: dial tcp 127.0.0.1:8443: getsockopt: connection refused
Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

[Note] Running diagnostic: DiagnosticPod
Description: Create a pod to run diagnostics from the application standpoint

ERROR: [DCli2001 from diagnostic DiagnosticPod@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/pod/run_diagnostics_pod.go:97]
Creating diagnostic pod with image openshift/origin-deployer:v3.10.0 failed. Error: (*url.Error) Post https://127.0.0.1:8443/api/v1/namespaces/myproject/pods: dial tcp 127.0.0.1:8443: getsockopt: connection refused

[Note] Summary of diagnostics execution (version v3.10.0+dd10d17):
[Note] Errors seen: 5

lifecyclrotten simaster

Source

ixeagle

Most helpful comment

I have the same issue, always timeout. do not know how to make it works.

29e7e280-0d1c-4bba-98fe-f7cd3ca7500a on 12 Sep 2018

👍2

All 14 comments

I have the same issue, always timeout. do not know how to make it works.

29e7e280-0d1c-4bba-98fe-f7cd3ca7500a on 12 Sep 2018

👍2

it seems to be a bug? or just the document outdated?

ixeagle on 13 Sep 2018

Why should this be a documentation bug?

To me it sounds totally reasonable to restart oc

fabiand on 13 Sep 2018

restart oc not fix the problem

29e7e280-0d1c-4bba-98fe-f7cd3ca7500a on 13 Sep 2018

take a look at
https://blog.csdn.net/huqigang/article/details/77962156

ixeagle on 14 Sep 2018

@openshift/sig-master

jwforres on 11 Oct 2018

Hi,

I had the same issue. The workaround that worked for me:

docker kill $(docker ps -qa)
docker rm $(docker ps -qa)
rm -rf .kube/
mount | grep openshift
(umount everything, example: umount openshift.local.clusterup/openshift.local.volumes/pods/ae667808-e4ff-11e8-bac6-080027711042/volumes/kubernetes.io~secret/serving-cert/)
rm -rf openshift.local.clusterup

systemctl restart docker

Then it should work.

nikelmark on 25 Nov 2018

😕1 👍1

@nikelmark That is hilarious. The point is that a 2nd start need not be a fresh environment... We ought to be able to stop and start clusters even if they are created using oc cluster up

mshutt on 15 Jan 2019

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 16 Apr 2019

@nikelmark You can use this to unmount quickly mount | grep openshift | awk '/.* on (.*) type/{print $3}' | xargs umount

stevefan1999-personal on 20 Apr 2019

@openshift-bot /remove-lifecycle stale

stevefan1999-personal on 20 Apr 2019

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 20 May 2019

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 19 Jun 2019

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.