Kops: Kubelet fail to register node arbitrarily using Spot Instances

Created on 11 Oct 2017  路  24Comments  路  Source: kubernetes/kops

kops 1.8.0-alpha.1
kubernetes: 1.7.8
provider: AWS

To reproduce it create a cluster and add and instance group with maxPrice value

Some instances not register with apiserver, some of them works fine so it seems some kind of race condition.
If you kill kubelet process, after respawn all works fine.

The relevant part of logs i could find in kubelet are this lines repeated in loop:

Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.987315    1281 kubelet.go:1894] SyncLoop (DELETE, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)"
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: W1011 08:39:47.990681    1281 kubelet.go:1596] Deleting mirror pod "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)" because it is outdated
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.990693    1281 mirror_client.go:85] Deleting a mirror pod "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system"
Oct 11 08:39:47 ip-172-20-104-75 kubelet[1281]: I1011 08:39:47.994312    1281 kubelet.go:1888] SyncLoop (REMOVE, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(b1664e35-ae5f-11e7-8c91-062a9a3fee2c)"
Oct 11 08:39:48 ip-172-20-104-75 kubelet[1281]: I1011 08:39:48.008283    1281 kubelet.go:1878] SyncLoop (ADD, "api"): "kube-proxy-ip-172-20-104-75.eu-west-1.compute.internal_kube-system(bd58ce30-ae5f-11e7-8c91-062a9a3fee2c)"
[...]

Cluster yaml

kind: Cluster
metadata:
  creationTimestamp: 2017-10-11T08:54:21Z
  name: test
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kubernetes-artifacts/test
  dnsZone: test
  etcdClusters:
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: main
    version: 3.1.10
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: events
    version: 3.1.10
  iam:
    legacy: false
  kubeAPIServer:
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
  kubelet:
    featureGates:
      ExperimentalCriticalPodAnnotation: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.7.8
  masterInternalName: api.internal.test
  masterPublicName: api.test
  networkCIDR: 172.20.0.0/16
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.20.64.0/19
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.20.96.0/19
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 172.20.0.0/22
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 172.20.4.0/22
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 172.20.8.0/22
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    bastion:
      bastionPublicName: bastion.test
    dns:
      type: Public
    masters: private
    nodes: private

Instance group spotrequests

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-10-11T09:21:48Z
  labels:
    kops.k8s.io/cluster: test
  name: ephimeral
spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: m4.2xlarge
  maxSize: 7
  minSize: 3
  maxPrice: "0.15"
  role: Node
  subnets:
  - eu-west-1b
  - eu-west-1c
  - eu-west-1a
lifecyclstale

Most helpful comment

The script @ese provided didn't work for me, but then I found 2 small errors in the hook. After fixing those it seems to work nicely. The hook should have been:

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

The ExecStartPost had a trailing " which wasn't supposed to be there, and the hook name shouldn't be part of the manifest, but part of the actual hook. Otherwise the service doesn't register properly and doesn't work.

All 24 comments

After some deep analysis it seems to be a race condition running kubelet before AWS tags are available which it is happening with spot instances but maybe could happen also with on-demand instances.

  1321 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
  1321 tags.go:78] AWS cloud - no clusterID filtering applied for shared resources; do not run multiple clusters in this AZ.

Same for me after moving to image 2017-12-02 (both 1.7 and 1.8). 40% of started spot instances not added to the cluster.

I have this workaround in the cluster spec that seems to work

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service"
      name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

@ese thank's for the workaround! It works.

@ese nice find, any ideas on what we should do for the race condition? Not familiar with check-aws-tags

@chrislovecnm It runs a simple sh script which waits for ec2 tags and restarts kubelet than:

#!/bin/bash -x

function check_tags {
aws ec2 describe-instances --region $(curl -m 10 http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') --instance-ids $(curl -m 10 http://169.254.169.254/latest/meta-data/instance-id) | grep KubernetesCluster
}

until check_tags
do 
  sleep 1s 
  echo no tags 
done
sleep 1s
echo FINISH!

We probably need that in protokube

This should probably be fixed by k8s core, not kops per kubernetes/kubernetes#57382

@hubt agreed, but how are we suddenly having this problem? And I have not been able to recreate as well.

I've had this problem for a long time, my 1.4 cluster has done this forever, but i never investigated it and pinpointed it until i saw this. It's feels like it's most problematic when there's a lot of node churn, like a big sweep of spots takes out and then replaces a lot of nodes at once.

Well let me disagree, I think this may be handled best if we put it in the installer and not in k8s. Technically the node is not ready for k8s to be installed.

I think that's fine. If assumption is that only kops knows about spot vs on-demand instances and kops should take care of all the nuanced differences, then putting into protokube makes sense.

Unfortunately the workaround that @ese posted does not work for me. My spot instances still fail to register about half the time during a rolling update. I'm not sure if the root cause of my issue is with tagging (could be, just don't know for sure), but restarting kubelet manually during a rolling update fixes it for me. Not ideal...

Edit: There's what appears to be a typo in the workaround above (there's a " after the service name that I think shouldn't be there). Removing the " seems to have improved (but not totally resolved) the issue. On my five node test cluster only one failed to come back up during a rolling update. Maybe I just got lucky that time.

Edit 2: I've done a few more rolling updates since the comment and it seems like workaround IS working for me after removing the typo. I have only seen one failure in rolling updates since applying the non-typo workaround, and in that case it was something very different (kubelet wasn't even installed on the node after 10 minutes, no idea what went wrong there and haven't seen it before).

It may be a bit crude since it doesn't depend on checking the AWS API for node tags but right after the line:
set -o pipefail we added sleep 2m

because in general the tags will be present on the node after about 2 minutes. This is a decent (albeit naive) workaround for us at least until https://github.com/kubernetes/kubernetes/issues/57382 is taken care of.

The script @ese provided didn't work for me, but then I found 2 small errors in the hook. After fixing those it seems to work nicely. The hook should have been:

  hooks:
  - manifest: |
      Type=oneshot                    
      ExecStart=/usr/bin/docker run --net host quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

The ExecStartPost had a trailing " which wasn't supposed to be there, and the hook name shouldn't be part of the manifest, but part of the actual hook. Otherwise the service doesn't register properly and doesn't work.

I think this should be fixed in kubernetes 1.10 kubernetes/kubernetes#60125 if anyone can test it

Seems to be cherry-picked to latest 1.9.7

@ese @chrislovecnm could you provide the Dockerfile for this https://github.com/kubernetes/kops/issues/3605#issuecomment-351674234

@alok87

It is basically an image with curl and awscli running this bash script

#!/bin/bash -x

function check_tags {
aws ec2 describe-instances --region $(curl -m 10 http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') --instance-ids $(curl -m 10 http://169.254.169.254/latest/meta-data/instance-id) | grep KubernetesCluster
}

until check_tags
do 
  sleep 1s 
  echo no tags 
done
sleep 1s
echo FINISH!

Added --restart no to hook for fix issue #64507

  hooks:
  - manifest: |
      Type=oneshot
      ExecStart=/usr/bin/docker run --net host --restart no quay.io/sergioballesteros/check-aws-tags
      ExecStartPost=/bin/systemctl restart kubelet.service
    name: ensure-aws-tags.service
    requires:
    - docker.service
    roles:
    - Node

For this issue, are we waiting on the upstream Kubernetes fix?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

hi, is this issue fixed in kubernete 1.10 ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Caskia picture Caskia  路  3Comments

lnformer picture lnformer  路  3Comments

justinsb picture justinsb  路  4Comments

thejsj picture thejsj  路  4Comments

chrislovecnm picture chrislovecnm  路  3Comments