Hello,
kops version
Version 1.7.0 (git-e04c29d)
I attempted to create a custom ami to use with kops that has some pre-pulled docker images but ran into some issues.
I took the following steps:
kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 ami and 20Gb ebs volume attached. apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-08-28T14:54:51Z
labels:
kops.k8s.io/cluster: scpdcluster.k8s.local
name: nodes-t2-small-frontend
spec:
image: REDACTED/k8s-1.7-debian-jessie-amd64-hvm-ebs-cst-01 # my new ami
machineType: t2.small
maxSize: 2
minSize: 1
nodeLabels:
dedicated: frontend
role: Node
rootVolumeSize: 20
rootVolumeType: gp2
subnets:
- us-east-1a
kops update cluster --yeskops rolling-update cluster --yesThe nodes are started in the cluster using my ami and correct root volume size and type. However, when I ssh to the node and run sudo docker images my images are not longer present, I only see the usual kubernetes images:
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/external_storage/efs-provisioner latest b4da30527798 13 days ago 49.53 MB
gcr.io/google_containers/cluster-autoscaler v0.6.1 71a3e8b29e06 4 weeks ago 145 MB
protokube 1.7.0 f1aefdb5580c 7 weeks ago 363.4 MB
gcr.io/google_containers/kube-proxy v1.7.2 13a7af96c7e8 7 weeks ago 114.7 MB
gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.4 38bac66034a6 11 weeks ago 41.81 MB
gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.4 a8e00546bcf3 11 weeks ago 49.38 MB
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.4 f7f45b9cb733 11 weeks ago 41.41 MB
gcr.io/google_containers/cluster-proportional-autoscaler-amd64 1.1.2-r2 7d892ca550df 3 months ago 49.64 MB
gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 16 months ago 746.9 kB
When a node is created in the cluster does something happen during provisioning that would cause my images to be removed?
Thank you
Docker probably was re-installed, can you look at the daemon.log file for me. Nodeup does the docker installs
@chrislovecnm
I ran cat /var/log/daemon.log | grep nodeup | grep docker > daemon-parsed.log and then pulled this section out after a quick scan by eye:
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.188874 787 executor.go:157] Executing task "Package/docker-engine": Package: docker-engine
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.189119 787 package.go:134] Listing installed packages: dpkg-query -f ${db:Status-Abbrev}${Version}\n -W docker-engine
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.195244 787 changes.go:80] Field changed "Source" actual="<nil>" expected="http://apt.dockerproject.org/repo/pool/main/d/docker-engine/docker-engine_1.12.6-0~debian-jessie_amd64.deb"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: W0915 02:12:36.195932 787 package.go:335] cannot apply package changes for "docker-engine": Package:
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.196631 787 changes.go:80] Field changed "Definition" actual="<nil>" expected="[Unit]\nDescription=Kubernetes Protokube Service\nDocumentation=https://github.com/kubernetes/kops\n\n[Service]\nExecStartPre=/bin/true\nExecStart=/usr/bin/docker run -v /:/rootfs/ -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd --net=host --privileged --env KUBECONFIG=/rootfs/var/lib/kops/kubeconfig protokube:1.7.0 /usr/bin/protokube --cloud=aws --containerized=true --dns-internal-suffix=internal.scpdcluster.k8s.local --dns=gossip --master=false --v=4\nRestart=always\nRestartSec=2s\nStartLimitInterval=0\n\n[Install]\nWantedBy=multi-user.target\n"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.198871 787 executor.go:157] Executing task "service/docker-healthcheck.timer": Service: docker-healthcheck.timer
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.199140 787 changes.go:80] Field changed "Definition" actual="<nil>" expected="[Unit]\nDescription=Trigger docker-healthcheck periodically\n\n[Timer]\nOnUnitInactiveSec=10s\nUnit=docker-healthcheck.service\n\n[Install]\nWantedBy=multi-user.target"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.199982 787 files.go:50] Writing file "/lib/systemd/system/docker-healthcheck.timer"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.203944 787 executor.go:157] Executing task "service/docker-healthcheck.service": Service: docker-healthcheck.service
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.204219 787 changes.go:80] Field changed "Definition" actual="<nil>" expected="[Unit]\nDescription=Run docker-healthcheck once\n\n[Service]\nType=oneshot\nExecStart=/opt/kubernetes/helpers/docker-healthcheck\n\n[Install]\nWantedBy=multi-user.target"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.204335 787 files.go:50] Writing file "/lib/systemd/system/docker-healthcheck.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.209776 787 changes.go:80] Field changed "Definition" actual="<nil>" expected="[Unit]\nDescription=Kubernetes Kubelet Server\nDocumentation=https://github.com/kubernetes/kubernetes\nAfter=docker.service\n\n[Service]\nEnvironmentFile=/etc/sysconfig/kubelet\nExecStart=/usr/local/bin/kubelet \"$DAEMON_ARGS\"\nRestart=always\nRestartSec=2s\nStartLimitInterval=0\nKillMode=process\n"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.210356 787 executor.go:157] Executing task "Service/docker.service": Service: docker.service
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.210463 787 service.go:123] querying state of service "docker.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.302591 787 service.go:344] Restarting service "docker-healthcheck.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.344586 787 changes.go:80] Field changed "Definition" actual="[Unit]\nDescription=Docker Application Container Engine\nDocumentation=https://docs.docker.com\nAfter=network.target docker.socket\nRequires=docker.socket\n\n[Service]\nType=notify\n# the default is not to use systemd for cgroups because the delegate issues still\n# exists and systemd currently does not support the cgroup feature set required\n# for containers run by docker\nExecStart=/usr/bin/dockerd -H fd://\nExecReload=/bin/kill -s HUP $MAINPID\n# Having non-zero Limit*s causes performance problems due to accounting overhead\n# in the kernel. We recommend using cgroups to do container-local accounting.\nLimitNOFILE=infinity\nLimitNPROC=infinity\nLimitCORE=infinity\n# Uncomment TasksMax if your systemd version supports it.\n# Only systemd 226 and above support this version.\n#TasksMax=infinity\nTimeoutStartSec=0\n# set delegate yes so that systemd does not reset the cgroups of docker containers\nDelegate=yes\n# kill only the docker process, not all processes in the cgroup\nKillMode=process\n\n[Install]\nWantedBy=multi-user.target\n" expected="[Unit]\nDescription=Docker Application Container Engine\nDocumentation=https://docs.docker.com\nAfter=network.target docker.socket\nRequires=docker.socket\n\n[Service]\nType=notify\nEnvironmentFile=/etc/sysconfig/docker\nExecStart=/usr/bin/dockerd -H fd:// \"$DOCKER_OPTS\"\nExecReload=/bin/kill -s HUP $MAINPID\nKillMode=process\nTimeoutStartSec=0\nLimitNOFILE=1048576\nLimitNPROC=1048576\nLimitCORE=infinity\nRestart=always\nRestartSec=2s\nStartLimitInterval=0\nDelegate=yes\nExecStartPre=/opt/kubernetes/helpers/docker-prestart\n\n[Install]\nWantedBy=multi-user.target\n"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.345464 787 files.go:50] Writing file "/lib/systemd/system/docker.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.376692 787 service.go:344] Restarting service "docker-healthcheck.timer"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.510966 787 service.go:242] extracted depdendency from "ExecStart=/usr/bin/dockerd -H fd:// \"$DOCKER_OPTS\"": "/usr/bin/dockerd"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.511002 787 service.go:123] querying state of service "docker.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.519388 787 service.go:333] will restart service "docker.service" because dependency changed after service start
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.519607 787 service.go:344] Restarting service "docker.service"
Sep 15 02:12:36 ip-10-50-255-61 nodeup[787]: I0915 02:12:36.523211 787 service.go:355] Enabling service "docker-healthcheck.timer"
Sep 15 02:12:39 ip-10-50-255-61 nodeup[787]: I0915 02:12:39.880080 787 service.go:355] Enabling service "docker-healthcheck.service"
Looks like something is going on here? However, there is a lot more to the log even after parsing with "nodeup" and "docker", I can post it all if you like. Also I can parse with an suggested keywords as well.
Thanks
Can you post the log in a gist, and link it here? I would need to recreate, and you can rerun nodeup to see if it clears the containers again on a node.
@justinsb ideas?
@chrislovecnm
Here is a link to the full daemon.log.
I have user data in the ami that was useful when building the ami that logs into the ecr and pulls the docker images for me. I can see at the beginning of the log it performs this action and confirms the images are present.
I connected to the node in the cluster and re-pulled the docker images manually. I then reran nodeup using the following command /var/cache/kubernetes-install/nodeup --conf=/var/cache/kubernetes-install/kube_env.yaml --v=8 2> rerun-nodeup.log and captured the output in this log:
This had no effect and my images remained on the node. So something is happening when I first start up the node in the cluster.
@chrislovecnm Solved! --> just a difference in docker storage drivers
I investigated this further today and I took a look at the docker-engine package information on a node in my cluster and it looks like docker is not reinstalled otherwise I think the package info would reflect any changes in the 'Modfy' date:
$ stat /var/lib/dpkg/info/docker-engine.list
File: ‘/var/lib/dpkg/info/docker-engine.list’
Size: 5571 Blocks: 16 IO Block: 4096 regular file
Device: ca01h/51713d Inode: 1574372 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-09-22 13:04:43.350631000 +0000
Modify: 2017-07-28 04:00:49.878627794 +0000
Change: 2017-07-28 04:00:49.882627934 +0000
Birth: -
After digging through the daemon.log further I noticed that docker is using the overlay storage driver. When I start up an ec2 instance using the kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 ami outside of a cluster, it defaults to the devicemapper storage driver. So when I build my custom ami starting with the kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 ami, prior to pulling my images, I first create a daemon.json config file in /etc/docker with the following contents (as described in the docker docs for storage drivers):
{
"storage-driver": "overlay"
}
Then restart the docker service:
sudo systemctl restart docker
After pulling my images i found it was important to remove the /etc/docker/daemon.json file, before creating a snapshot, and registering the ami. If the file was not removed the ami generated could be used to successfully launch a node but the kubernetes components never installed/configured properly and the node never connected to the cluster.
For anyone interested, to automate this process I started an ec2 with a role that allows pulling from a private aws ecr and the following in the userdata:
#!/bin/bash
echo $'{
"storage-driver": "overlay"
}' > /etc/docker/daemon.json
systemctl restart docker
rm /etc/docker/daemon.json
export DOCKERLOGIN=$(/usr/local/bin/aws ecr get-login --region us-east-1)
$(echo $DOCKERLOGIN)
docker pull example/image1:0.1.0
docker pull example/image2:0.1.0
I then connected to the ec2 and made sure the images have finished pulling before stopping, creating a snapshot, and registering the ami. When I use the new ami in the cluster my images are there!
Sounds like this is resolved, feel free to re-open if not.
Most helpful comment
@chrislovecnm Solved! --> just a difference in docker storage drivers
I investigated this further today and I took a look at the docker-engine package information on a node in my cluster and it looks like docker is not reinstalled otherwise I think the package info would reflect any changes in the 'Modfy' date:
After digging through the daemon.log further I noticed that docker is using the
overlaystorage driver. When I start up an ec2 instance using thekope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28ami outside of a cluster, it defaults to thedevicemapperstorage driver. So when I build my custom ami starting with thekope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28ami, prior to pulling my images, I first create adaemon.jsonconfig file in/etc/dockerwith the following contents (as described in the docker docs for storage drivers):Then restart the docker service:
After pulling my images i found it was important to remove the
/etc/docker/daemon.jsonfile, before creating a snapshot, and registering the ami. If the file was not removed the ami generated could be used to successfully launch a node but the kubernetes components never installed/configured properly and the node never connected to the cluster.For anyone interested, to automate this process I started an ec2 with a role that allows pulling from a private aws ecr and the following in the userdata:
I then connected to the ec2 and made sure the images have finished pulling before stopping, creating a snapshot, and registering the ami. When I use the new ami in the cluster my images are there!