What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
(Doesn't repro very often) Creation of master node fails with error
Cloud-init v. 18.3-52-gc5f78957-1~bddeb~18.04.1 running 'modules:config' at Fri, 01 Nov 2019 18:58:08 +0000. Up 119.99 seconds.
[init] Using Kubernetes version: v1.14.6
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: NAME:
crictl info - Display information of the container runtime
USAGE:
crictl info [command options] [arguments...]
OPTIONS:
--output value, -o value Output format, One of: json|yaml (default: "json")
time="2019-11-01T18:59:13Z" level=fatal msg="failed to connect: failed to connect: context deadline exceeded"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Cloud-init v. 18.3-52-gc5f78957-1~bddeb~18.04.1 running 'modules:final' at Fri, 01 Nov 2019 18:58:34 +0000. Up 145.84 seconds.
2019-11-01 18:59:14,128 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2019-11-01 18:59:14,185 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2019-11-01 18:59:14,186 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
What did you expect to happen:
Master node to come up
Anything else you would like to add:
This is the containerd log for journalctl -u containerd -l
"Failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Environment:
kubectl version):/etc/os-release):/kind bug
@timothysc: is cloud init being run b4 services are up?
@detiber: It does look like it is running before the service fully comes up in this case
@timothysc: The cloudinit cloud loop till preflight check passes or timeout
@detiber: parts of cloud-init need to come up before networking, and others after. I think we can add a looping pre-flight check like @timothysc mentioned to help here
comments from meeting:
@akutz - cloud-init targets may also be used.
@randomvariable - in addition, systemd before/after/wants/bindsto overrides
Hi @figo / @codenrhoden,
I think one of you should take this in the image builder. I believe a drop-in for a cloud-init phase target should be able to set up the required service order to prevent the race condition.
Hi @figo / @codenrhoden,
I think one of you should take this in the image builder. I believe a drop-in for a cloud-init phase target should be able to set up the required service order to prevent the race condition.
Just to confirm, is it we want to make sure containerd service running before cloud-init?
something like this for containerd?
[Install]
WantedBy=cloud-init.target
@akutz @timothysc
I don't believe WantedBy determines order, only that containerd is wanted by cloud-init. Please see these docs for more information:
Note that requirement dependencies do not influence the order in which services are started or stopped. This has to be configured independently with the After= or Before= options. If unit foo.service pulls in unit bar.service as configured with Wants= and no ordering is configured with After= or Before=, then both units will be started simultaneously and without any delay between them if foo.service is activated.
Note that those settings are independent of and orthogonal to the requirement dependencies as configured by Requires=, Wants=, Requisite=, or BindsTo=. It is a common pattern to include a unit name in both the After= and Wants= options, in which case the unit listed will be started before the unit that is configured with these options.
You'll likely want something like this in a drop-in:
[Install]
Before=cloud-init.target
WantedBy=cloud-init.target
I'm not sure that would work, since containerd is likely dependent on the networking target.
I'm not sure if there is a way to trigger it to say that it is required before cloud-init starts the final phase.
Hi @detiber,
I think there may be if we change it to this:
[Install]
Before=cloud-init.service
WantedBy=cloud-init.service
The cloud-init.service is the boot stage of cloud-init at which point networking must be online.
Or rather this to be sure:
[Install]
After=cloud-init-local.service
Wants=cloud-init-local.service
Before=cloud-init.service
WantedBy=cloud-init.service
This should ensure containerd is started between the two phases where cloud-init has brought networking online. This starts things as early as possible for containerd I think.
There are several cloud-init boot stages we can use, each with a corresponding systemd service we can target.
Since runcmd is called by cloud-final.service should we defer to having it there, that way if someone wanted to override the containerd config in any of the other stages, they could?
So After/Wants=cloud-config.service and Before/WantedBy=cloud-final.service?
Oh, good idea.
Since runcmd is called by
cloud-final.serviceshould we defer to having it there, that way if someone wanted to override the containerd config in any of the other stages, they could?So After/Wants=cloud-config.service and Before/WantedBy=cloud-final.service?
I hope image itself can ensure order correctness, rather than expecting certain user-data, does this make sense?
@figo I can foresee cases where someone might want to inject config during bootstrapping, which is what I was referring to. Being able to specify a custom config file for containerd in the bootstrapping config, for example. I would expect that by adding a systemd dropin with the criteria I mentioned about should help support that use case (and we are enforcing the ordering in the image, but allowing user-data defined config for the service).
It might be helpful to define a similar dropin for the kubelet as well, that would also add an After/Wants for containerd.
/assign @figo
/unassign @detiber @akutz
@detiber @akutz Ansible don't come with support of systemd "After/Want/Wantedby".
TASK [containerd : start containerd service] ***********************************
fatal: [default]: FAILED! => {"changed": false, "msg": "Unsupported parameters for (systemd) module: after, before, wantedBy, wants Supported parameters include: daemon_reexec, daemon_reload, enabled, force, masked, name, no_block, scope, state, user"}
Although the systemd containerd.service file comes from containerd tar file, we could replace it with our own version and add new rules to it.
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
[Install]
WantedBy=multi-user.targe
The drawback is: this systemd service configure file suppose to be updated by containerd release, it soon becomes obsolete once we have our own version.
You don鈥檛 need to do that @figo. Just use Ansible to write a drop in file for containerd. A drop in can add/replace settings for an existing systemd unit file. Read more at https://www.freedesktop.org/software/systemd/man/systemd.unit.html and https://wiki.archlinux.org/index.php/systemd#Drop-in_files.
For example:
/etc/systemd/system/containerd.d/cloudinit.conf
[Unit]
After=cloud-config.service
Wants=cloud-config.service
Before=cloud-final.service
WantedBy=cloud-final.service
Hi @figo,
Based on the conversation in #112, I wanted to revise the drop-in example from above. We should have two drop-ins:
/etc/systemd/system/containerd.d/cloud-init.conf
[Unit]
# Start containerd as late as possible with respect to cloud-init's
# boot stages. This ensures that cloud-init modules such as
# write-files may be used to configure containerd prior to it
# starting.
#
# Please see the following link for more information about the
# cloud-init boot stage managed by the cloud-config.service:
# https://cloudinit.readthedocs.io/en/latest/topics/boot.html#config
After=cloud-config.service
Wants=cloud-config.service
# Ensure containerd is started before the cloud-init boot stage
# that executes the runcmd module. This module is responsible
# for running "kubeadm init", and thus containerd must be
# started before this command is processed.
#
# Please see the following link for more information about the
# cloud-init boot stage managed by the cloud-final.service:
# https://cloudinit.readthedocs.io/en/latest/topics/boot.html#final
Before=cloud-final.service
WantedBy=cloud-final.service
/etc/systemd/system/containerd.d/memory-pressure.conf
[Service]
# Decreases the likelihood that containerd is killed due to memory
# pressure.
#
# Please see the following link for more information about the
# OOMScoreAdjust configuration property:
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html#OOMScoreAdjust=
OOMScoreAdjust=-999
cc @detiber @dims
Issue been fixed with https://github.com/kubernetes-sigs/image-builder/pull/113
/close
@figo: Closing this issue.
In response to this:
Issue been fixed with https://github.com/kubernetes-sigs/image-builder/pull/113
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/open
Found a problem with https://github.com/kubernetes-sigs/image-builder/pull/113
runcmd is in the cloud-config stage, not cloud-final.
Three options at resolution:
Change the override to be WantedBy cloud-config instead of cloud-final
Pros:
Cons:
For reference, on Ubuntu this is
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- snap
- snap_config # DEPRECATED- Drop in version 18.2
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- ubuntu-advantage
- ntp
- timezone
- disable-ec2-metadata
- runcmd
- byobu
And AL:
cloud_config_modules:
- disk_setup
- mounts
- locale
- set-passwords
- yum-configure
- yum-add-repo
- package-update-upgrade-install
- timezone
- disable-ec2-metadata
- runcmd
Move runcmd to cloud-final.
Pros:
Cons:
Use script-user, which is
Pros:
Cons:
I can't see an issue with going with Option 1. write_files takes place in cloud-init.service prior to cloud-config.service
/reopen
@randomvariable: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Hi @figo,
Based on the conversation in #112, I wanted to revise the drop-in example from above. We should have two drop-ins:
/etc/systemd/system/containerd.d/cloud-init.conf/etc/systemd/system/containerd.d/memory-pressure.confcc @detiber @dims