Openshift-ansible: node cannot start, openvswitch doesn't create default cluster network

Created on 27 Jun 2017 · 2Comments · Source: openshift/openshift-ansible

Description

Node won't start, journalctl says:

19978 start_node.go:139] master has not created a default cluster network, network plugin "redhat/openshift-ovs-subnet" can not start

Version

 ansible --version
ansible 2.3.1.0
  config file = /home/pafer/Projects/openshift-perso/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, May 10 2017, 20:04:28) [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)]

git describe 
openshift-ansible-3.6.123-1-35-g2d4c399

Steps To Reproduce

Configure 3 hosts
launch install

Expected Results

Node should start

Observed Results

Failure summary:

  1. Host:     XXX.XXX.XXX.137
     Play:     Configure nodes
     Task:     openshift_node : restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.


  2. Host:     XXX.XXX.XXX.247
     Play:     Configure nodes
     Task:     openshift_node : restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.


  3. Host:     XXX.XXX.XXX.35
     Play:     Configure nodes
     Task:     openshift_node : restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

Additional Information

Nodes/Masters are Centos 7 installed on scaleway, I checked that selinux is disabled, flannel is not an option for me because internal ip may change.

Source

metal3d

Most helpful comment

For those who have the same problem than me, it seems that sometimes the installation failed somewhere and I get docker0 interface to something that is in portal CIDR (172.30.0.0).

Resintalling didnt' changed docker0 interface ip, even if you stop and remove docker package. I saw docker0 interface were still up.

The only way to get it working was to stop docker0 interface, check the entire system to remove docker configuration, reinstall docker and relaunch.

So, that error is mainly an IP colision between docker and ovs configuration. It very important that docker0 interface hasn't got 172.30.0.0/16 CIDR, and changing portal ip cidr in inventory cannot fix it (AFAIK it never takes effect).

So, I close that issue.

metal3d on 27 Jun 2017

👍4

All 2 comments

For those who have the same problem than me, it seems that sometimes the installation failed somewhere and I get docker0 interface to something that is in portal CIDR (172.30.0.0).

Resintalling didnt' changed docker0 interface ip, even if you stop and remove docker package. I saw docker0 interface were still up.

The only way to get it working was to stop docker0 interface, check the entire system to remove docker configuration, reinstall docker and relaunch.

So, I close that issue.

metal3d on 27 Jun 2017

👍4

This happened with me as I was using 172.30.0.0/16 as subnet. Later I changed it to 10.0.0.0/16 and it worked!