Origin: oc cluster up failed at the 2nd time

Created on 11 Sep 2018  路  14Comments  路  Source: openshift/origin

[provide a description of the issue]

Version

oc v3.10.0+dd10d17
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Steps To Reproduce
  1. [step 1]
    the 1st time run oc cluster up, it succeeded with such information:
    Login to server ...
    Creating initial project "myproject" ...
    Server Information ...
    OpenShift server started.

The server is accessible via web console at:
https://127.0.0.1:8443

  1. [step 2]
    I just run oc cluster down, and then run oc cluster up, it fails giving the followings:

Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.10 ...
I0911 14:49:01.583893 13673 flags.go:30] Running "create-kubelet-flags"
I0911 14:49:02.557376 13673 run_kubelet.go:48] Running "start-kubelet"
I0911 14:49:03.119012 13673 run_self_hosted.go:172] Waiting for the kube-apiserver to be ready ...
E0911 14:54:03.121936 13673 run_self_hosted.go:542] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition

run oc adm diagnostics:

[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at '/root/.kube/config'

ERROR: [CED1008 from controller openshift/origin/pkg/oc/admin/diagnostics/cluster.go]
Unknown error testing cluster-admin access for context 'myproject/127-0-0-1:8443/developer':
Post https://127.0.0.1:8443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 127.0.0.1:8443: getsockopt: connection refused

ERROR: [CED1008 from controller openshift/origin/pkg/oc/admin/diagnostics/cluster.go]
Unknown error testing cluster-admin access for context 'default/127-0-0-1:8443/system:admin':
Post https://127.0.0.1:8443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 127.0.0.1:8443: getsockopt: connection refused

[Note] Could not configure a client with cluster-admin permissions for the current server, so cluster diagnostics will be skipped

[Note] Running diagnostic: ConfigContexts[/127-0-0-1:8443/developer]
Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/config_contexts.go:299]
For client config context '/127-0-0-1:8443/developer':
The server URL is 'https://127.0.0.1:8443'
The user authentication is 'developer/127-0-0-1:8443'
The current project is 'default'
(*url.Error) Get https://127.0.0.1:8443/apis/project.openshift.io/v1/projects: dial tcp 127.0.0.1:8443: getsockopt: connection refused
Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

[Note] Running diagnostic: ConfigContexts[default/127-0-0-1:8443/system:admin]
Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/config_contexts.go:299]
For client config context 'default/127-0-0-1:8443/system:admin':
The server URL is 'https://127.0.0.1:8443'
The user authentication is 'system:admin/127-0-0-1:8443'
The current project is 'default'
(*url.Error) Get https://127.0.0.1:8443/apis/project.openshift.io/v1/projects: dial tcp 127.0.0.1:8443: getsockopt: connection refused
Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

[Note] Running diagnostic: DiagnosticPod
Description: Create a pod to run diagnostics from the application standpoint

ERROR: [DCli2001 from diagnostic DiagnosticPod@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/client/pod/run_diagnostics_pod.go:97]
Creating diagnostic pod with image openshift/origin-deployer:v3.10.0 failed. Error: (*url.Error) Post https://127.0.0.1:8443/api/v1/namespaces/myproject/pods: dial tcp 127.0.0.1:8443: getsockopt: connection refused

[Note] Summary of diagnostics execution (version v3.10.0+dd10d17):
[Note] Errors seen: 5

lifecyclrotten simaster

Most helpful comment

I have the same issue, always timeout. do not know how to make it works.

All 14 comments

I have the same issue, always timeout. do not know how to make it works.

it seems to be a bug? or just the document outdated?

Why should this be a documentation bug?

To me it sounds totally reasonable to restart oc

restart oc not fix the problem

@openshift/sig-master

Hi,

I had the same issue. The workaround that worked for me:

docker kill $(docker ps -qa)
docker rm $(docker ps -qa)
rm -rf .kube/
mount | grep openshift
(umount everything, example: umount openshift.local.clusterup/openshift.local.volumes/pods/ae667808-e4ff-11e8-bac6-080027711042/volumes/kubernetes.io~secret/serving-cert/)
rm -rf openshift.local.clusterup

systemctl restart docker

Then it should work.

BR

@nikelmark That is hilarious. The point is that a 2nd start need not be a fresh environment... We ought to be able to stop and start clusters even if they are created using oc cluster up

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@nikelmark You can use this to unmount quickly mount | grep openshift | awk '/.* on (.*) type/{print $3}' | xargs umount

@openshift-bot /remove-lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings