Installing Openshit Origin on node without NetworkManager controlled interfaces failed.
/etc/origin/node/resolv.conf generated by /etc/NetworkManager/dispatcher.d/99-dnsmasq-origin-dns.sh. Which is not invoked at all when there is no NM controlled NICs. Then systemctl start origin-node failed with
-- Subject: Unit origin-node.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit origin-node.service has begun starting up.
Sep 11 00:24:04 ocp-node-4 dnsmasq[15980]: setting upstream servers from DBus
Sep 11 00:24:04 ocp-node-4 dnsmasq[15980]: using nameserver 127.0.0.1#53 for domain cluster.local
Sep 11 00:24:04 ocp-node-4 dnsmasq[15980]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Sep 11 00:24:04 ocp-node-4 origin-node[7442]: I0911 00:24:04.919558 7442 start_node.go:251] Reading node configuration from /etc/origin/node/node-config.yaml
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.038696 7442 node.go:123] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "ocp-node-4.yeslab.local" (IP ""), iptables sync period "30s"
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.038981 7442 common.go:85] Skipping loopback/non-IPv4 addr: "127.0.0.1" for node ocp-node-4.yeslab.local
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039007 7442 node.go:136] Failed to determine node address from hostname ocp-node-4.yeslab.local; using default interface (Failed to obtain IP address from node name: ocp-node-4.yeslab.local)
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039217 7442 interface.go:248] Default route transits interface "ens4"
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039454 7442 interface.go:93] Interface ens4 is up
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039544 7442 interface.go:138] Interface "ens4" has 2 addresses :[192.168.8.19/24 fe80::c0ff:fea8:813/64].
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039580 7442 interface.go:105] Checking addr 192.168.8.19/24.
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039595 7442 interface.go:114] IP found 192.168.8.19
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039618 7442 interface.go:144] valid IPv4 address for interface "ens4" found as 192.168.8.19.
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039630 7442 interface.go:254] Choosing IP 192.168.8.19
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039657 7442 node.go:143] Resolved IP address to "192.168.8.19"
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.039760 7442 ipcmd.go:44] Executing: /usr/sbin/ip link set lbr0 down
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.049676 7442 ipcmd.go:48] Error executing /usr/sbin/ip: Cannot find device "lbr0"
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.050085 7442 docker.go:364] Connecting to docker on unix:///var/run/docker.sock
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.050123 7442 docker.go:384] Start docker client with request timeout=2m0s
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: W0911 00:24:05.052102 7442 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.059097 7442 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.069173 7442 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: I0911 00:24:05.077280 7442 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Sep 11 00:24:05 ocp-node-4 origin-node[7442]: F0911 00:24:05.079517 7442 start_node.go:140] could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory
Sep 11 00:24:05 ocp-node-4 systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Sep 11 00:24:05 ocp-node-4 dnsmasq[15980]: setting upstream servers from DBus
Sep 11 00:24:05 ocp-node-4 systemd[1]: Failed to start OpenShift Node.
-- Subject: Unit origin-node.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit origin-node.service has failed.
--
-- The result is failed.
Sep 11 00:24:05 ocp-node-4 systemd[1]: Unit origin-node.service entered failed state.
Sep 11 00:24:05 ocp-node-4 systemd[1]: origin-node.service failed.
$ ansible --version
ansible 2.3.1.0
config file = /home/bacek/openshift/openshift-ansible.bacek/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]
$ git describe
openshift-ansible-3.6.173.0.5-5-44-g66ea091
# cat /etc/sysconfig/network-scripts/ifcfg-ens3
DEVICE=ens3
BOOTPROTO=static
NM_CONTROLLED=no
TYPE=Ethernet
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=10.1.0.145
Install and enable NetworkManager to avoid #4950
install or scaleup cluster to use this node.
Node is installed and added to the cluster.
$ ansible-playbook -i ../hosts.ini playbooks/byo/openshift-node/scaleup.yml
...
RUNNING HANDLER [openshift_node : restart node] ***************************************************************************************************************************
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [ocp-node-3.yeslab.local]: FAILED! => {
"attempts": 3,
"changed": false,
"failed": true
}
MSG:
Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
Simple touch /etc/origin/node/resolv.conf will kick-start node after ansible failed.
[root@ocp-node-4 ~]# systemctl status origin-node
â—Ź origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/origin-node.service.d
└─openshift-sdn-ovs.conf
Active: activating (auto-restart) (Result: exit-code) since Mon 2017-09-11 00:32:09 UTC; 2s ago
Docs: https://github.com/openshift/origin
Process: 10605 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
Process: 10602 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
Process: 10576 ExecStart=/usr/bin/openshift start node --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
Process: 10573 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
Process: 10571 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
Main PID: 10576 (code=exited, status=255)
Sep 11 00:32:09 ocp-node-4 systemd[1]: Failed to start OpenShift Node.
Sep 11 00:32:09 ocp-node-4 systemd[1]: Unit origin-node.service entered failed state.
Sep 11 00:32:09 ocp-node-4 systemd[1]: origin-node.service failed.
[root@ocp-node-4 ~]# touch /etc/origin/node/resolv.conf
[root@ocp-node-4 ~]# systemctl reset-failed origin-node
[root@ocp-node-4 ~]# systemctl start origin-node
[root@ocp-node-4 ~]# systemctl status origin-node
â—Ź origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/origin-node.service.d
└─openshift-sdn-ovs.conf
Active: active (running) since Mon 2017-09-11 00:32:28 UTC; 10s ago
Docs: https://github.com/openshift/origin
Process: 10686 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
Process: 10684 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
Process: 10692 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
Process: 10690 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
Main PID: 10695 (openshift)
Memory: 40.6M
CGroup: /system.slice/origin-node.service
├─10695 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=5
└─10737 journalctl -k -f
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.265373 10695 kubelet.go:2069] Container runtime status: Runtime Conditions: RuntimeReady=true ...: message:
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.307678 10695 eviction_manager.go:197] eviction manager: synchronize housekeeping
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348088 10695 summary.go:389] Missing default interface "eth0" for node:ocp-node-4.yeslab.local
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348219 10695 helpers.go:744] eviction manager: observations: signal=nodefs.inodesFree, availab... +0000 UTC
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348257 10695 helpers.go:744] eviction manager: observations: signal=imagefs.available, availab... +0000 UTC
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348271 10695 helpers.go:746] eviction manager: observations: signal=allocatableMemory.availabl... 7908384Ki
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348282 10695 helpers.go:744] eviction manager: observations: signal=memory.available, availabl... +0000 UTC
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348295 10695 helpers.go:744] eviction manager: observations: signal=nodefs.available, availabl... +0000 UTC
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.348321 10695 eviction_manager.go:292] eviction manager: no resources are starved
Sep 11 00:32:37 ocp-node-4 origin-node[10695]: I0911 00:32:37.842202 10695 generic.go:182] GenericPLEG: Relisting
Hint: Some lines were ellipsized, use -l to show in full.
I'm facing the same issue as well.
Any inputs on workaround ?
I am facing the same issue as well.
```# ansible --version
ansible 2.3.1.0
config file = /root/openshift-ansible/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.5 (default, Aug 2 2016, 04:20:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
"msg": "Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n",
```
Is there any update on this issue? I tried to run the playbook from tag openshift-ansible-3.6.173.0.58-1.
When I create the resolv.conf manually before installation it works without problems, could someone tell if I can use this cluster now or if there will be any consequences?
I checked here where the following is mentioned for my case (static IP):
_Disabled, then configure your network interface to be static, and add DNS nameservers to NetworkManager._
Do I still have to let the NetworkManager manage my static IP? How else could I add a DNS server to NetworkManager?
Yes NetworkManager is required as described in the install doc https://docs.openshift.org/latest/install_config/install/prerequisites.html#prereq-networkmanager
Just remove NM_CONTROLLED=no from your NIC configuration.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen.
Mark the issue as fresh by commenting/remove-lifecycle rotten.
Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Yes NetworkManager is required as described in the install doc https://docs.openshift.org/latest/install_config/install/prerequisites.html#prereq-networkmanager
Just remove NM_CONTROLLED=no from your NIC configuration.