Kops: Kubelet fail to register node arbitrarily using Spot Instances

Created on 11 Oct 2017 · 24Comments · Source: kubernetes/kops

kops 1.8.0-alpha.1
kubernetes: 1.7.8
provider: AWS

To reproduce it create a cluster and add and instance group with maxPrice value

Some instances not register with apiserver, some of them works fine so it seems some kind of race condition.
If you kill kubelet process, after respawn all works fine.

The relevant part of logs i could find in kubelet are this lines repeated in loop:

Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.987315    1281 kubelet.go:1894] SyncLoop (DELETE, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)"
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: W1011 08:39:47.990681    1281 kubelet.go:1596] Deleting mirror pod "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)" because it is outdated
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.990693    1281 mirror_client.go:85] Deleting a mirror pod "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system"
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.994312    1281 kubelet.go:1888] SyncLoop (REMOVE, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)"
Oct 11 08:39:48 ip-172-20-104-75 kubelet[1281]: I1011 08:39:48.008283    1281 kubelet.go:1878] SyncLoop (ADD, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(bd58ce30-ae5f-11e7-8c91-062a9a3fee2c)"
[...]

Cluster yaml

kind: Cluster
metadata:
  creationTimestamp: 2017-10-11T08:54:21Z
  name: test
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kubernetes-artifacts/test
  dnsZone: test
  etcdClusters:
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: main
    version: 3.1.10
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: events
    version: 3.1.10
  iam:
    legacy: false
  kubeAPIServer:
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
  kubelet:
    featureGates:
      ExperimentalCriticalPodAnnotation: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.7.8
  masterInternalName: api.internal.test
  masterPublicName: api.test
  networkCIDR: 172.20.0.0/16
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.20.64.0/19
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.20.96.0/19
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 172.20.0.0/22
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 172.20.4.0/22
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 172.20.8.0/22
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    bastion:
      bastionPublicName: bastion.test
    dns:
      type: Public
    masters: private
    nodes: private

Instance group spotrequests

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-10-11T09:21:48Z
  labels:
    kops.k8s.io/cluster: test
  name: ephimeral
spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: m4.2xlarge
  maxSize: 7
  minSize: 3
  maxPrice: "0.15"
  role: Node
  subnets:
  - eu-west-1b
  - eu-west-1c
  - eu-west-1a

lifecyclstale

Source

ese

👍5

Most helpful comment

The script @ese provided didn't work for me, but then I found 2 small errors in the hook. After fixing those it seems to work nicely. The hook should have been:

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

The ExecStartPost had a trailing " which wasn't supposed to be there, and the hook name shouldn't be part of the manifest, but part of the actual hook. Otherwise the service doesn't register properly and doesn't work.

FrederikNS on 24 Jan 2018

👍16 ❤2

All 24 comments

After some deep analysis it seems to be a race condition running kubelet before AWS tags are available which it is happening with spot instances but maybe could happen also with on-demand instances.

  1321 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
  1321 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.

ese on 15 Oct 2017

Same for me after moving to image 2017-12-02 (both 1.7 and 1.8). 40% of started spot instances not added to the cluster.

argusua on 14 Dec 2017

I have this workaround in the cluster spec that seems to work

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service"
      name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

ese on 14 Dec 2017

👍9

@ese thank's for the workaround! It works.

argusua on 14 Dec 2017

@ese nice find, any ideas on what we should do for the race condition? Not familiar with check-aws-tags

chrislovecnm on 14 Dec 2017

@chrislovecnm It runs a simple sh script which waits for ec2 tags and restarts kubelet than:

#!/bin/bash -x

function check_tags {
aws ec2 describe-instances --region $(curl -m 10 http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') --instance-ids $(curl -m 10 http://169.254.169.254/latest/meta-data/instance-id) | grep KubernetesCluster
}

until check_tags
do 
  sleep 1s 
  echo no tags 
done
sleep 1s
echo FINISH!

argusua on 15 Dec 2017

👍2

We probably need that in protokube

chrislovecnm on 15 Dec 2017

This should probably be fixed by k8s core, not kops per kubernetes/kubernetes#57382

hubt on 2 Jan 2018

@hubt agreed, but how are we suddenly having this problem? And I have not been able to recreate as well.

chrislovecnm on 2 Jan 2018

I've had this problem for a long time, my 1.4 cluster has done this forever, but i never investigated it and pinpointed it until i saw this. It's feels like it's most problematic when there's a lot of node churn, like a big sweep of spots takes out and then replaces a lot of nodes at once.

hubt on 2 Jan 2018

Well let me disagree, I think this may be handled best if we put it in the installer and not in k8s. Technically the node is not ready for k8s to be installed.

chrislovecnm on 2 Jan 2018

I think that's fine. If assumption is that only kops knows about spot vs on-demand instances and kops should take care of all the nuanced differences, then putting into protokube makes sense.

hubt on 2 Jan 2018

Unfortunately the workaround that @ese posted does not work for me. My spot instances still fail to register about half the time during a rolling update. I'm not sure if the root cause of my issue is with tagging (could be, just don't know for sure), but restarting kubelet manually during a rolling update fixes it for me. Not ideal...

Edit: There's what appears to be a typo in the workaround above (there's a " after the service name that I think shouldn't be there). Removing the " seems to have improved (but not totally resolved) the issue. On my five node test cluster only one failed to come back up during a rolling update. Maybe I just got lucky that time.

Edit 2: I've done a few more rolling updates since the comment and it seems like workaround IS working for me after removing the typo. I have only seen one failure in rolling updates since applying the non-typo workaround, and in that case it was something very different (kubelet wasn't even installed on the node after 10 minutes, no idea what went wrong there and haven't seen it before).

jordanjennings on 3 Jan 2018

👍1

It may be a bit crude since it doesn't depend on checking the AWS API for node tags but right after the line:
set -o pipefail we added sleep 2m

because in general the tags will be present on the node after about 2 minutes. This is a decent (albeit naive) workaround for us at least until https://github.com/kubernetes/kubernetes/issues/57382 is taken care of.

zparnold on 5 Jan 2018

The script @ese provided didn't work for me, but then I found 2 small errors in the hook. After fixing those it seems to work nicely. The hook should have been:

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

FrederikNS on 24 Jan 2018

👍16 ❤2

I think this should be fixed in kubernetes 1.10 kubernetes/kubernetes#60125 if anyone can test it

ese on 27 Mar 2018

Seems to be cherry-picked to latest 1.9.7

krogon-dp on 23 Apr 2018

@ese @chrislovecnm could you provide the Dockerfile for this https://github.com/kubernetes/kops/issues/3605#issuecomment-351674234

alok87 on 26 Apr 2018

👍1

@alok87

It is basically an image with curl and awscli running this bash script

#!/bin/bash -x

function check_tags {
aws ec2 describe-instances --region $(curl -m 10 http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') --instance-ids $(curl -m 10 http://169.254.169.254/latest/meta-data/instance-id) | grep KubernetesCluster
}

until check_tags
do 
  sleep 1s 
  echo no tags 
done
sleep 1s
echo FINISH!

ese on 26 Apr 2018

👍3

Added --restart no to hook for fix issue #64507

  hooks:
  - manifest: |
      Type=oneshot
      ExecStart=/usr/bin/docker run --net host --restart no quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

argusua on 30 May 2018

For this issue, are we waiting on the upstream Kubernetes fix?

sc250024 on 4 Jul 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale