Origin: oc cluster up fails on Ubuntu 17.04

Created on 28 Apr 2017 · 10Comments · Source: openshift/origin

I am running Ubuntu 17.04 with a local docker daemon (17.03.1) installed.
Trying to start up Openshift Origin using oc cluster up fails.

Version

oc v1.5.0+031cbe4
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Steps To Reproduce

set up Ubuntu 17.04
install openshift client tools
oc cluster up

Current Result

````
~$ oc cluster up
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ...
WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using 172.17.0.1 as the server IP
-- Starting OpenShift container ...
Creating initial OpenShift configuration
Starting OpenShift using container 'origin'
Waiting for API server to start listening
OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... FAIL
Error: cannot install registry
Details:
Last 10 lines of "origin" container log:
E0428 09:06:37.805343 11376 container_manager_linux.go:808] Error parsing docker version "17.03.1-ce": illegal zero-prefixed version component "03" in "17.03.1-ce"
I0428 09:06:37.805375 11376 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
I0428 09:06:37.805394 11376 status_manager.go:129] Starting to sync pod status with apiserver
I0428 09:06:37.805424 11376 kubelet.go:1715] Starting kubelet main sync loop.
I0428 09:06:37.805433 11376 kubelet.go:1726] skipping pod synchronization - [container runtime is down]
I0428 09:06:37.827496 11376 volume_manager.go:244] Starting Kubelet Volume Manager
I0428 09:06:37.851052 11376 node.go:359] Starting DNS on 0.0.0.0:53
I0428 09:06:37.851305 11376 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
I0428 09:06:37.851314 11376 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
I0428 09:06:37.851419 11376 logs.go:41] skydns: listen udp 0.0.0.0:53: bind: address already in use

Caused By:
Error: exit directly
````

Expected Result

successful startup of the cluster

Additional Information

It seems that the problem is related to binding skydns to the wildcard interface 0.0.0.0. This is because in Ubuntu 17.04 there is a DNS running on 127.0.0.53 by default:

````
~$ cat /etc/resolv.conf

Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)

DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

127.0.0.53 is the systemd-resolved stub resolver.

run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53
```I have been able to fix the problem (locally) by editing the node config file at/var/lib/origin/openshift.local.config/node-172.17.0.1/node-config.yaml, setting thednsBindAddressproperty to172.17.0.1:53instead of0.0.0.0:53` and then using

oc cluster up --use-existing-config=true

to start the cluster using the modified config file. This then results in the following output:

````
~$ oc cluster up --use-existing-config=true
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ...
WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ...
Deleted existing OpenShift container
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using 172.17.0.1 as the server IP
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
Waiting for API server to start listening
OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... OK
-- Installing router ... OK
-- Importing image streams ... OK
-- Importing templates ... OK
-- Login to server ... OK
-- Creating initial project "myproject" ... OK
-- Removing temporary directory ... OK
-- Checking container networking ... OK
-- Server Information ...
OpenShift server started.
The server is accessible via web console at:
https://172.17.0.1:8443

You are logged in as:
User: developer
Password: developer

To login as administrator:
oc login -u system:admin
````

componencli kinbug prioritP2

Source

sophokles73

Most helpful comment

You are mistaken, this issue is not about failure to parse the docker version. Please read the additional information I have included.

This is about a problem with binding to 0.0.0.0:53 when there is a DNS server already running on 127.0.0.53:53 (on latest ubuntu 17.04).

sophokles73 on 28 Apr 2017

👍2

All 10 comments

E0428 09:06:37.805343 11376 container_manager_linux.go:808] Error parsing docker version "17.03.1-ce": illegal zero-prefixed version component "03" in "17.03.1-ce"

Please use the latest oc binary which has a fix for this parsing.

Alternatively, you can downgrade docker.

pweil- on 28 Apr 2017

You are mistaken, this issue is not about failure to parse the docker version. Please read the additional information I have included.

This is about a problem with binding to 0.0.0.0:53 when there is a DNS server already running on 127.0.0.53:53 (on latest ubuntu 17.04).

sophokles73 on 28 Apr 2017

👍2

@csrwng I thought we tried :8053 if :53 had something sitting on it?

stevekuznetsov on 28 Apr 2017

@stevekuznetsov @sophokles73 right now we require both :8053 and :53 to be available to run origin.

csrwng on 28 Apr 2017

I was looking for where does the oc asks for port 53 and, in the middle of a ton of files i found this comment:

"//However, if the user has provided an override DNSAddr, we need to honor the value if
// the port is not 53 and we do that by disabling node DNS."

Is there a way to set DNSAddr so we can change PORT to other values?

UPDATE: This is the file pkg/cmd/server/start/start_allinone.go

HelioCampos on 6 Jun 2017

Please, forget my comment. I readed the rest of the file and saw that if other than the port 53 if choosen, it will disable the DNS because glibc and ClusterDNS can't in any other place.

// if the user set the DNS port to anything but 53, disable node DNS since ClusterDNS (and glibc)
// can't look up DNS on anything other than 53, so we'll continue to use the proxy.

HelioCampos on 6 Jun 2017

I ran into this same problem and disabled the service 'resolved':
https://askubuntu.com/questions/907246/how-to-disable-systemd-resolved-in-ubuntu

After that the cluster successfully started :smile:

lion7 on 22 Jun 2017

sounds like this is understood and working as designed, closing.

bparees on 27 Jun 2017

Is disabling the main DNS server on the system really the proper solution for this? Especially developers might prefer running their own DNS server locally instead of using e.g. their ISP's recursive DNS. Having to stop it to run a local OpenShift cluster is not very nice.