Kops: ELB created from the cluster in multiple subnets register to only one subnet

Created on 13 Dec 2016  路  42Comments  路  Source: kubernetes/kops

built Kops from master branch Version git-03ad0cf
The cluster is provisioned as 3 master and 3 nodes cluster across 3 subnets with Weave as CNI.
When I create an ELB as kubernetes service, the ELB is registered to only one subnet while all instances in different subnets are registered.

P1 blocks-next

Most helpful comment

Reopened!

All 42 comments

Can I get your config? How do I reproduce?

@chrislovecnm here my current cluster config

metadata:
  creationTimestamp: "2016-12-09T22:46:52Z"
  name: spark-k8s.niketech.com
spec:
  adminAccess:
  - 0.0.0.0/0
  channel: stable
  cloudProvider: aws
  configBase: s3://spark-k8s.niketech.com/spark-k8s.niketech.com
  etcdClusters:
  - etcdMembers:
    - name: us-west-2a
      zone: us-west-2a
    - name: us-west-2b
      zone: us-west-2b
    - name: us-west-2c
      zone: us-west-2c
    name: main
  - etcdMembers:
    - name: us-west-2a
      zone: us-west-2a
    - name: us-west-2b
      zone: us-west-2b
    - name: us-west-2c
      zone: us-west-2c
    name: events
  kubernetesVersion: v1.4.6
  masterInternalName: api.internal.spark-k8s.niketech.com
  masterPublicName: api.spark-k8s.niketech.com
  networkCIDR: 10.178.152.0/22
  networkID: vpc-9ba084fe
  networking:
    weave: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  topology:
    masters: public
    nodes: public
  zones:
  - cidr: 10.178.152.0/24
    id: subnet-937a32e4
    name: us-west-2a
  - cidr: 10.178.153.0/24
    id: subnet-dfedc7ba
    name: us-west-2b
  - cidr: 10.178.154.0/24
    id: subnet-fb284ba2
    name: us-west-2c

The 3 subnets are existing private subnets.

Once cluster is up and running, I'm creating replica set which deploys containers that simply echo back the container ID on port 9000.
Then I create load balancer service which expose this pods.

apiVersion: v1
kind: ReplicationController
metadata:
  name: hello-node
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: hello-node
    spec:
      containers:
      - name: hello-node
        image: sdheisenberg/hello-node:latest
        ports:
        - containerPort: 9000
---
apiVersion: v1
kind: Service
metadata:
  name: hello-node-dhub-svc
  labels:
    name: hello-node
    annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 9000
      protocol: TCP
  selector:
    # This needs to match the selector in the RC
    app: hello-node

screen shot 2016-12-14 at 11 01 26 am

ELB gets created and registers all instances, but only one subnet is registered in ELB for some reason.

ELB gets created and registers all instances, but only one subnet is registered in ELB for some reason.

Gonna ask a dumb question :) What subnets should the elb have? I am thinking this is a limitation of the cloud provider which needs to be fixed!

Can you validate in master branch please?

@justinsb please review

@chrislovecnm I believe the subnets for all instances should be added.
In above example, as instances in us-west-2a and 2c are registered in ELB, those subnets should be added as well.
I can add the subnets manually, and all instances come to InService state shortly.

@caseeker I pinged @justinsb about this, and we agree this is a bug.

I just pulled up one of our deployments and we are not seeing this behavior. Can I get the kops command that you are using? Also can you try head?

Thanks

Chris

Also, I will give a bit of a warning, we merged a big PR refactoring our API, so if you want to test a sha from around Saturday, please let me know.

I just tried this with the latest kops and k8s 1.4.7, and could not reproduce (with either an internal or public-facing ELB). I did hit https://github.com/kubernetes/kubernetes/issues/39593 though.

I suggest we see if this can be reproduced once kops 1.5-alpha is out (imminently!)

I believe this is fixed in 1.5.0 beta1 (or later). If anyone continues to hit it with kops 1.5.0 beta1 or later, please comment / reopen, ideally with instructions on how to reproduce :-)

I am seeing this issue running the 1.5.0-beta2.

$ kops version
Version 1.5.0-beta2 (git-e0cec889a)

Not sure if it's important, but the cluster was upgraded using this version from a previous installation that used the kops 1.5.0-beta1 release off master from the end of last week.

I've configured the server with this configuration (I've redacted some information, but hopefully nothing important):

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2017-02-01T08:28:20Z"
  name: <CLUSTER_NAME>
spec:
  api:
    loadBalancer:
      type: Internal
  channel: alpha
  cloudProvider: aws
  configBase: <S3_STATE_STORE>
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-<AWS_REGION_1>
      name: <AWS_REGION_1>
    - instanceGroup: master-<AWS_REGION_2>
      name: <AWS_REGION_2>
    - instanceGroup: master-<AWS_REGION_3>
      name: <AWS_REGION_3>
    name: main
  - etcdMembers:
    - instanceGroup: master-<AWS_REGION_1>
      name: <AWS_REGION_1>
    - instanceGroup: master-<AWS_REGION_2>
      name: <AWS_REGION_2>
    - instanceGroup: master-<AWS_REGION_3>
      name: <AWS_REGION_3>
    name: events
  kubernetesApiAccess:
  - <ADMIN_ACCESS_CIDR>
  kubernetesVersion: v1.5.2
  masterInternalName: api.internal.<CLUSTER_NAME>
  masterPublicName: api.<CLUSTER_NAME>
  networkCIDR: <NETWORK_CIDR>
  networkID: <VPC_ID>
  networking:
    weave: {}
  nonMasqueradeCIDR: <NON_MASQ_CIDR>
  sshAccess:
  - <ADMIN_ACCESS_CIDR>
  subnets:
  - cidr: <CIDR_1>
    id: <SUBNET_1>
    name: <AWS_REGION_1>
    egress: <NAT_GATEWAY_1>
    type: Private
    zone: <AWS_REGION_1>
  - cidr: <CIDR_2>
    id: <SUBNET_2>
    name: <AWS_REGION_2>
    egress: <NAT_GATEWAY_2>
    type: Private
    zone: <AWS_REGION_2>
  - cidr: <CIDR_3>
    id: <SUBNET_3>
    name: <AWS_REGION_3>
    egress: <NAT_GATEWAY_3>
    type: Private
    zone: <AWS_REGION_3>
  - cidr: <CIDR_4>
    id: <SUBNET_4>
    name: utility-<AWS_REGION_1>
    type: Utility
    zone: <AWS_REGION_1>
  - cidr: <CIDR_5>
    id: <SUBNET_5>
    name: utility-<AWS_REGION_2>
    type: Utility
    zone: <AWS_REGION_2>
  - cidr: <CIDR_6>
    id: <SUBNET_6>
    name: utility-<AWS_REGION_3>
    type: Utility
    zone: <AWS_REGION_3>
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

The subnets are pre-existing subnets with AWS managed NAT gateways running in the Utility networks. Both the master and nodes are correctly brought up in the Private subnets, with an internal API ELB that is correctly configured with all 3 subnets.

When I deploy a new service, only is added to the ELB.

apiVersion: v1
kind: Service
metadata:
  name: echo-service
  labels:
    app: echo-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
  selector:
    app: echo-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

Not sure if it's important, but this is an internal ELB, not a public facing one. I'm using these for now as I think I am coming across https://github.com/kubernetes/kubernetes/issues/29298, preventing me from creating public facing ELBs

Actually, seems I was wrong that it was always SUBNET_1 that was added to the ELB. I've just deleted and recreated the ELB and this time SUBNET_3 was added instead.

I'm seeing this as well with kops 1.5.1 and an identical setup to @RyanFrench.

ping @justinsb i'm still seeing this issue, can it be re-opened?

Reopened!

@justinsb here is my LB setup for nginx

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:XXXXXXXX
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
spec:
  type: LoadBalancer
  ports:
  - port: 80
    name: http
  - port: 443
    name: https
    targetPort: 80
  selector:
    app: nginx

Whenever, I apply this using kubectl, the LB only has the AZ that my master is in. I'm running a small test cluster with one master and three nodes.

kops config

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2017-02-15T15:18:49Z"
  name: jwhitcraft.k8s.aws.test
spec:
  api:
    loadBalancer:
      type: Internal
  channel: alpha
  cloudProvider: aws
  configBase: zzzzzzzzzzz
  dnsZone: sugarcrm.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    name: events
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.5.2
  masterInternalName: api.internal.jwhitcraft.k8s.aws.test
  masterPublicName: api.jwhitcraft.k8s.aws.test
  networkCIDR: 10.27.16.0/20
  networkID: vpc-yyyyyyy
  networking:
    weave: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.27.16.0/23
    id: subnet-16dc363d
    name: us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: 10.27.20.0/23
    id: subnet-838129f4
    name: us-east-1c
    type: Private
    zone: us-east-1c
  - cidr: 10.27.24.0/23
    id: subnet-5fdc3674
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  - cidr: 10.27.18.0/23
    id: subnet-9e8129e9
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

@justinsb I think this is part of the problem: https://github.com/kubernetes/kubernetes/issues/42957

Since i'm using an existing VPC, I'm going to say this is what is going on.

We had this problem.
After adding the same tag that is on the ELB created by:

annotations:
  service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0

on our shared subnets, it works fine. The tag is (for us):
KubernetesCluster : k8s.test.eu-west-1.aws.redacted.net

This is the cluster name from:

$ kops get cluster --state s3://k8s/kops/
NAME                    CLOUD   ZONES
k8s.test.eu-west-1.aws.redacted.net aws eu-west-1a,eu-west-1b,eu-west-1c
# or
$ kubectl config get-clusters
NAME
k8s.test.eu-west-1.aws.redacted.net

Fortunately for us, we don't need to have several clusters in the same VPC.
But it would be nice not to have to change our existing infra for this to work.

Seems like kops didn't tag my existing subnets either and I see the same problem with the ELB not attaching to all subnets unless I add those tags. Is this expected behaviour? Should I tag the subnets manually myself?

I am also wondering if adding the tags manually would possibly cause kops to delete the VPC or anything like that upon deletion. My guess is it could not do it because of all the other non kubernetes resources running the VPC but figured I would ask?

It also looks like we need to add these tags

// TagNameSubnetInternalELB is the tag name used on a subnet to designate that
// it should be used for internal ELBs
const TagNameSubnetInternalELB = "kubernetes.io/role/internal-elb"

// TagNameSubnetPublicELB is the tag name used on a subnet to designate that
// it should be used for internet ELBs
const TagNameSubnetPublicELB = "kubernetes.io/role/elb"

Also seeing the same thing with KOPS v1.6.0 in AWS with an existing VPC and Subnets. I had to tag all subnets on my own outside of KOPS before this worked.

I was wondering if 1.6.0 fixed this issue.
For right now, I'm also using the work around to add KubernetesCluster tag manually, but then there will be problem where we cannot have more than one cluster in existing VPC.

@zachaller - I have spun up and down several clusters with KOPS while working on getting the right configurations set for our team and we haven't had anything get removed with our setup to date!

@zachaller If you add the tag to the subnets, ex:

KubernetesCluster : k8s.test.eu-west-1.aws.redacted.net

Then kops delete cluster will try to delete the subnets.
If you have other ressources already in the subnets (NAT gateway, EC2 instances...), then AWS won't let you delete the subnets and the operation will timeout.

But if you have nothing else in the subnets, they're gone, which can be at least pretty annoying.

We have an open PR to fix kops from deleting the subnets.

I just created a new cluster with Kops 1.6.1 without any tags on the existing subnets and my ELB registered to all subnets. The ELB has the tags:
KubernetesCluster: k8s.prod.eu-central-1.redacted.net
kubernetes.io/cluster/k8s.prod.eu-central-1.redacted.net: owned
This new tag kubernetes.io/cluster/..., from here seems to have helped

Fixed in Kops 1.6.1 and later, I think this can be closed.

@kenden - still observing this issue in kops 1.6.1 when launching a cluster in existing VPC (using pre-existing subnets). It looks like pre-existing subnets need to be manually tagged before a service ELB will include them, so calling this issue fixed is misleading, until it is addressed via https://github.com/kubernetes/kubernetes/issues/42957

@izakp I just checked with kops 1.6.2 and you are right, the problem is still present.
Maybe i only checked for internet facing elbs.
Or I checked that private subnets had no tags, and public subnets actually had. Sorry for the false report.
@caseeker @chrislovecnm could you reopen this issue?

This is what I just tried:

  • With no public or private subnets having the tag KubernetesCluster in the VPC,
    my internal ELB is registered to one private subnet (in eu-west-1b)
    --> not working
  • With all public and private subnet having the tag KubernetesCluster in the VPC,
    my internal ELB is registered to:

    • 2 private subnets (in eu-west-1b and eu-west-1a)

    • 1 public subnet (in eu-west-1c)

      --> not really working (mix of public and private subnets used)

  • With private subnets having the tag kubernetes.io/role/internal-elb: myclustername,
    and public subnets having the tag kubernetes.io/role/elb: myclustername:
    my internal ELB is registered to only one private subnet (in eu-west-1b)
    --> not working

It might yet work for internet facing ELBs, I have not (re)checked.

Confirmed that (for internet-facing ELBs) that when the tag kubernetes.io/cluster/k8s.prod.eu-central-1.redacted.net: shared is set on all shared subnets, the ELB with be created in all of these subnets. So this should be the recommended approach when re-using existing subnets - to make sure this tag is set on the subnets for all clusters.

I tested with the tag
kubernetes.io/cluster/k8s.test.eu-west-1.aws.redacted.net: shared
instead of
KubernetesCluster:k8s.test.eu-west-1.aws.redacted.net
on all private and public subnets in the VPC and got the same result (as expected)
my internal ELB is registered to:

  • 2 private subnets (in eu-west-1b and eu-west-1a)
  • 1 public subnet (in eu-west-1c)

@kenden cool - for reference - I got the shared tag here. I might just add in the docs, that you should tag your subnets appropriately when launching a new cluster within them.

@izakp The elb being internal it can't be accessed publicly but it's kind of strange to see it on a public subnet. Not sure this ticket is a kops issue, it seems more a Kubernetes ticket.

@kenden the behavior with internal ELBs is definitely odd. My clusters are in a public topology, as below:

subnets:
  - id: subnet-redacted
    name: us-east-1b
    type: Public
    zone: us-east-1b
  - id: subnet-redacted
    name: us-east-1c
    type: Public
    zone: us-east-1c
  - id: subnet-redacted
    name: us-east-1d
    type: Public
    zone: us-east-1d
  - id: subnet-redacted
    name: us-east-1e
    type: Public
    zone: us-east-1e
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

...and yet when I create an internal ELB it is attached to these public subnets. In this case, AWS gives the warning "This is an internal ELB, but there is an Internet Gateway attached to the subnet you have just selected" (and the DNS name of the "internal" ELB responds to DNS queries outside of my VPC. This particular issue does seem more like a Kubernetes ticket - in order to avoid this scenario I suppose Kubernetes needs to be aware of its network topology when creating ELBs.

We have a couple of PRs in that should address this.

/assign @geojazz @justinsb

@chrislovecnm: GitHub didn't allow me to assign the following users: geojazz.

Note that only kubernetes members can be assigned.

In response to this:

We have a couple of PRs in that should address this.

/assign @geojazz @justinsb

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This should be all set in 1.8.x via #3682. Closing...

Hey Guys, I'm quite new to k8s. I still seem to have the same issue on version 1.8.X. Here is my config

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: kuberneteslab.vimtechcorp.com
spec:
  api:
    loadBalancer:
      type: Internal
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kops-state-0000/kuberneteslab.vimtechcorp.com
  dnsZone: Z25YM7GL79TOCI
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-2a
      name: a
    - instanceGroup: master-ap-southeast-2b
      name: b
    - instanceGroup: master-ap-southeast-2c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-2a
      name: a
    - instanceGroup: master-ap-southeast-2b
      name: b
    - instanceGroup: master-ap-southeast-2c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: v1.8.6
  masterPublicName: api.kuberneteslab.vimtechcorp.com
  networkCIDR: 10.0.0.0/16
  networkID: vpc-5966c13e
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.0.2.0/24
    id: subnet-4169f508
    name: ap-southeast-2a
    type: Private
    zone: ap-southeast-2a
  - cidr: 10.0.3.0/24
    id: subnet-9d5f3dfa
    name: ap-southeast-2b
    type: Private
    zone: ap-southeast-2b
  - cidr: 10.0.5.0/24
    id: subnet-f2d73faa
    name: ap-southeast-2c
    type: Private
    zone: ap-southeast-2c
  - cidr: 10.0.2.0/24
    id: subnet-4169f508
    name: utility-ap-southeast-2a
    type: Utility
    zone: ap-southeast-2a
  - cidr:10.0.3.0/24  
    id: subnet-9d5f3dfa
    name: utility-ap-southeast-2b
    type: Utility
    zone: ap-southeast-2b
  - cidr:10.0.5.0/24  
    id: subnet-f2d73faa
    name: utility-ap-southeast-2c
    type: Utility
    zone: ap-southeast-2c
  topology:
    bastion:
      bastionPublicName: bastion.kuberneteslab.vimtechcorp.com
    dns:
      type: Private
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kuberneteslab.vimtechcorp.com
  name: master-ap-southeast-2a
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-ap-southeast-2a
  role: Master
  subnets:
  - ap-southeast-2a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kuberneteslab.vimtechcorp.com
  name: master-ap-southeast-2b
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-ap-southeast-2b
  role: Master
  subnets:
  - ap-southeast-2b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kuberneteslab.vimtechcorp.com
  name: master-ap-southeast-2c
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-ap-southeast-2c
  role: Master
  subnets:
  - ap-southeast-2c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kuberneteslab.vimtechcorp.com
  name: nodes
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: t2.micro
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - ap-southeast-2a
  - ap-southeast-2b
  - ap-southeast-2c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  labels:
    kops.k8s.io/cluster: kuberneteslab.vimtechcorp.com
  name: bastions
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
  role: Bastion
  subnets:
  - utility-ap-southeast-2a
  - utility-ap-southeast-2b
  - utility-ap-southeast-2c

screenshot from 2018-03-02 00-15-51

As you can see on the screenshot all nodes under api ELB are out of service. All subnets are private. I've already tried with Loadbalancer type has been set to be of "Public" as well but to no avail. Please can someone help out?

/Vimuth

@vimuthdee if you ELBs are out of service then possibly your masters did not start properly. Do you mind opening another issue for me, sorry to ask. This is a really old issue that we can link, but I do not think this is the problem.

Also, we need more details how you are creating your cluster. The issue template should help you.

Thanks!

Hi @chrislovecnm Thank you for trying to help. By the way I found the answer to my query myself. It was just that dns hostnames has been disabled(default value) for my custom vpc. When I enabled it everything started to work as a charm. :)

Sorry for posting this question, I should have looked a little longer. Thanks so much for all the hard work You guys have extended for k8s. Keep it up.

No worries!

Was this page helpful?
0 / 5 - 0 ratings