Kubespray: Unable to bring up LoadBalancer on AWS

Created on 3 May 2019 · 17Comments · Source: kubernetes-sigs/kubespray

Once cluster is up and running, kubectl apply the following:

apiVersion: v1
kind: Service
metadata:
  name: httpbin
spec:
  selector:
    app: httpbin
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: 80

The result is:

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/httpbin      LoadBalancer   10.233.34.215   <pending>     80:31118/TCP   18m
service/kubernetes   ClusterIP      10.233.0.1      <none>        443/TCP        22h

The LoadBalancer stays in Pending. With Kops the LoadBalancer is provisioned. I copied the policy on the master role:

    {
      "Effect": "Allow",
      "Action": ["elasticloadbalancing:*"],
      "Resource": ["*"]
    },

to the worker role policy. Any ideas why this is not working?

Environment:
AWS via terraform contrib then ansible cluster.yml

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.19.34-coreos x86_64

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2079.3.0
VERSION_ID=2079.3.0
BUILD_ID=2019-04-22-2119
PRETTY_NAME="Container Linux by CoreOS 2079.3.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Version of Ansible : ansible 2.7.10

Kubespray version (commit) (git rev-parse --short HEAD):
00369303

Network plugin used:
calico

Copy of your inventory file:

k8s-xxx-master0 ansible_host=10.134.8.97
k8s-xxx-master1 ansible_host=10.134.9.162
k8s-xxx-master2 ansible_host=10.134.10.217
k8s-xxx-worker0 ansible_host=10.134.8.13
k8s-xxx-worker1 ansible_host=10.134.9.25
k8s-xxx-worker2 ansible_host=10.134.10.130
k8s-xxx-etcd0 ansible_host=10.134.8.241
k8s-xxx-etcd1 ansible_host=10.134.9.159
k8s-xxx-etcd2 ansible_host=10.134.10.20
bastion ansible_host=xx.xx.xx.xx

[bastion]
bastion ansible_host=xx.xx.xx.xx ansible_user=core

[kube-master]
k8s-xx-master0
k8s-xx-master1
k8s-xx-master2


[kube-node]
k8s-xx-worker0
k8s-xx-worker1
k8s-xx-worker2


[etcd]
k8s-xxx-etcd0
k8s-xxx-etcd1
k8s-xxx-etcd2


[k8s-cluster:children]
kube-node
kube-master


[k8s-cluster:vars]
apiserver_loadbalancer_domain_name="elb-k8s-xxx-xxxxxxxxx.us-west-2.elb.amazonaws.com"

Command used to invoke ansible:
ansible-playbook -i inventory/xxx/hosts.yml --become --become-user=root cluster.yml

kinbug lifecyclrotten

Source

mabushey

All 17 comments

I'm having the same issue. We might need to set cloud_provider to aws in all.yml but if I do that then my cluster does not get created.

habbas99 on 8 May 2019

@habbas99 Exactly the same here. The only way I could get it to work is to bring it up without cloud_provider: aws in all.yml, and then log into each master and add the - --cloud_provider: aws flag to two files under /etc/kubernetes/manifests/ and then restart two services...

mabushey on 8 May 2019

@mabushey Which services did you restart after adding the cloud provider flag in /etc/kubernetes/manifests/ for each master node?

habbas99 on 8 May 2019

I had an issue with manually adding the could_provider aws line and restarting the services...

From: https://blog.scottlowe.org/2018/09/28/setting-up-the-kubernetes-aws-cloud-provider/:

You must have the --cloud-provider=aws flag added to the Kubelet before adding the node to the cluster. Key to the AWS integration is a particular field on the Node object—the .spec.providerID field—and that field will only get populated if the flag is present when the node is added to the cluster. If you add a node to the cluster and then add the command-line flag afterward, this field/value won’t get populated and the integration won’t work as expected. No error is surfaced in this situation (at least, not that I’ve been able to find).

There's 2.5 thousand forks on this project because they don't accept pull requests (At least they won't take mine because I run my own mail server). Most likely someone has already fixed this... Also I've tried both master and release-2.10 branches.

mabushey on 9 May 2019

@habbas99 To specifically answer your question, I added - --cloud-provider=aws to /etc/kubernetes/manifests/kube-controller-manager.yaml and /etc/kubernetes/manifests/kube-apiserver.yaml and then did systemctl daemon-reload and systemctl restart kubelet

mabushey on 10 May 2019

👍1

@mabushey Thanks! Now the ELB gets created in AWS but I'm not able to reach the paths that I defined in my ingress rules. When I try to hit the load balancer I get a 503. The only thing I noticed was that the workers are not added as instances to the ELB. Do I need to add the cloud-provider flag to the worker nodes as well?

habbas99 on 15 May 2019

@habbas99 - Read my previous comment (three up). The .spec.providerID field does not get set by adding the cloud provider later.

mabushey on 16 May 2019

@mabushey I'm wondering if there is a work around. I might try adding the worker instances to the created ELB but I'm not sure if it would make a difference.

habbas99 on 17 May 2019

Is Kubespray a dead project or just dead on AWS?

mabushey on 13 Jun 2019

@mabushey Kubespray is indeed not dead at all.

You can try to join our Slack channel and ask for more help. This community is based on people helping out and using their time for free and thereby sometimes, some issues will not get that much attention. I think the main focus for people using Kubespray is on Bare-Metal and not AWS which would be why the AWS issues do not get as much attention.

Can you try the latest master branch, and also in the issue include every step you do to set up your cluster.

woopstar on 13 Jun 2019

👎1 👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 11 Sep 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 11 Oct 2019

this PR fixes this bug https://github.com/kubernetes-sigs/kubespray/pull/4338
after this you can set cloud_provider: "aws"

/remove-lifecycle rotten

fentas on 20 Oct 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 18 Jan 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 17 Feb 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 18 Mar 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.