Aws-load-balancer-controller: Delete loadbalancer resources in existing, separated Kubernetes cluster after deployment

Created on 26 Jun 2018  Â·  18Comments  Â·  Source: kubernetes-sigs/aws-load-balancer-controller

We have a running Kuberntes cluster in the same AWS account, with alb-ingress-controller controlled ingresses. This name is "test-cluster.internal".

When a new Kubernetes cluster started, with name "live-cluster.internal", when the alb-ingress-controller deployed, it DELETES the already existing loadbalancers and target groups.

It started and immediately delete the anoter cluster's existing resource:

ubuntu@ip-10-202-13-63:~$ kubectl logs alb-ingress-controller-69c9b847c5-vqfzg -n kube-system
I0626 16:00:37.571370       1 main.go:24] [ALB-INGRESS] [controller] [INFO]: Log level read as "", defaulting to INFO. To change, set LOG_LEVEL environment variable to WARN, ERROR, or DEBUG.
I0626 16:00:37.571721       1 launch.go:112] &{ALB Ingress Controller 1.0-alpha.9 git-11cdb4a8 git://github.com/kubernetes-sigs/aws-alb-ingress-controller}
I0626 16:00:37.571875       1 launch.go:282] Creating API client for https://100.64.0.1:443
I0626 16:00:37.583242       1 launch.go:295] Running in Kubernetes Cluster version v1.9 (v1.9.6) - git (clean) commit 9f8ebd171479bec0ada837d7ee641dec2f8c6dd1 - platform linux/amd64
I0626 16:00:37.585281       1 launch.go:134] validated kube-system/default-http-backend as the default backend
I0626 16:00:37.589904       1 albingresses.go:85] [ALB-INGRESS] [ingress] [INFO]: Building list of existing ALBs
I0626 16:00:37.683768       1 albingresses.go:94] [ALB-INGRESS] [ingress] [INFO]: Fetching information on 1 ALBs
I0626 16:00:38.011262       1 targetgroups.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Fetching Targets for Target Group arn:aws:elasticloadbalancing:eu-west-1:571188843450:targetgroup/bmkubernete-30658-HTTP-2a53f87/1289759a5fc2d701
I0626 16:00:38.067303       1 targetgroups.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Fetching Targets for Target Group arn:aws:elasticloadbalancing:eu-west-1:571188843450:targetgroup/bmkubernete-31364-HTTP-2a53f87/29a6840f649f5b2a
I0626 16:00:38.135349       1 targetgroups.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Fetching Targets for Target Group arn:aws:elasticloadbalancing:eu-west-1:571188843450:targetgroup/bmkubernete-32497-HTTP-2a53f87/8eb65e30b9173cb6
I0626 16:00:38.187385       1 listeners.go:77] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Fetching Rules for Listener arn:aws:elasticloadbalancing:eu-west-1:571188843450:listener/app/bmkubernete-default-bmtest-7639/5a10c9cf5756d3c3/9553958eb47b7f69
I0626 16:00:38.203338       1 albingress.go:203] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Ingress rebuilt from existing ALB in AWS
I0626 16:00:38.203371       1 albingresses.go:189] [ALB-INGRESS] [ingress] [INFO]: Assembled 1 ingresses from existing AWS resources in 613.445553ms
I0626 16:00:38.203408       1 controller.go:1359] starting Ingress controller
I0626 16:00:38.630484       1 leaderelection.go:174] attempting to acquire leader lease...
I0626 16:00:38.630771       1 controller.go:477] backend reload required
I0626 16:00:38.630831       1 loadbalancer.go:230] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Start ELBV2 (ALB) deletion.
I0626 16:00:38.637249       1 leaderelection.go:184] successfully acquired lease kube-system/ingress-controller-leader-alb
I0626 16:00:38.637298       1 status.go:193] new leader elected: alb-ingress-controller-69c9b847c5-vqfzg
I0626 16:00:38.683627       1 loadbalancer.go:238] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Completed ELBV2 (ALB) deletion. Name: bmkubernete-default-bmtest-7639 | ARN: arn:aws:elasticloadbalancing:eu-west-1:571188843450:loadbalancer/app/bmkubernete-default-bmtest-7639/5a10c9cf5756d3c3
I0626 16:00:38.683650       1 listener.go:74] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Start Listener deletion.
I0626 16:00:38.693789       1 listener.go:79] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Completed Listener deletion.
I0626 16:00:38.693869       1 rule.go:96] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Start Rule deletion.
I0626 16:00:38.730221       1 rule.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Completed Rule deletion. Rule Priority: "1" | Condition: [{    Field: "host-header",    Values: ["wwwwww"]  },{    Field: "path-pattern",    Values: ["/*"]  }]
I0626 16:00:38.730242       1 rule.go:96] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Start Rule deletion.
I0626 16:00:38.772282       1 rule.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Completed Rule deletion. Rule Priority: "2" | Condition: [{    Field: "path-pattern",    Values: ["/*"]  },{    Field: "host-header",    Values: ["wwwwwwwwww"]  }]
I0626 16:00:38.772307       1 rule.go:96] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Start Rule deletion.
I0626 16:00:38.793698       1 rule.go:103] [ALB-INGRESS] [default/bm-test-ingress] [INFO]: Completed Rule deletion. Rule Priority: "3" | Condition: [{    Field: "path-pattern",    Values: ["/*"]  },{    Field: "host-header",    Values: ["wwwwwwwww"]  }]
I0626 16:00:49.006933       1 controller.go:486] ingress backend successfully reloaded...

It shouldn't touch other, already existing resources!

Most helpful comment

What I would like to do use query the tagging API for resources with the kubernetes.io/cluster/<cluster name> tag and use those to build the state. The issue is all of the existing users who don't have that tag on their resources.

What I am thinking right now is supporting both methods in the 1.0 release and then removing the legacy ALB name searching method in the following release.

All 18 comments

Make sure the cluster name is unique to each cluster. When the controller starts up it looks for existing resources with a matching cluster name and then reconciles those resource against the current configuration inside of the Kubernetes cluster.

I did it, two times.
The new cluster is (whic deletes resources in existing older one):

```ubuntu@ip-10-202-13-63:~$ kubectl config view
apiVersion: v1
clusters:

  • cluster:
    certificate-authority-data: REDACTED
    server: https://api.bm-kubernetes-live.internal
    name: bm-kubernetes-live.internal
    contexts:
  • context:
    cluster: bm-kubernetes-live.internal
    user: bm-kubernetes-live.internal
    name: bm-kubernetes-live.internal
    current-context: bm-kubernetes-live.internal
    kind: Config
    preferences: {}
    users:
  • name: bm-kubernetes-live.internal
The older, existing one is:

ubuntu@ip-10-201-2-234:~$ kubectl config view
apiVersion: v1
clusters:

  • cluster:
    certificate-authority-data: REDACTED
    server: https://api.bm-kubernetes-test-lqhnqcka.internal
    name: bm-kubernetes-test-lqhnqcka.internal
    contexts:
  • context:
    cluster: bm-kubernetes-test-lqhnqcka.internal
    user: bm-kubernetes-test-lqhnqcka.internal
    name: bm-kubernetes-test-lqhnqcka.internal
    current-context: bm-kubernetes-test-lqhnqcka.internal
    kind: Config
    preferences: {}
    users:
  • name: bm-kubernetes-test-lqhnqcka.internal
    ```

Can I see the controller pod manifest? The cluster name is a parameter to the controller.

Yes.
This is a "old" one:

ubuntu@ip-10-201-2-234:~$ kubectl describe pod alb-ingress-controller -n kube-system
Name:           alb-ingress-controller-767896b666-xrlvw
Namespace:      kube-system
Node:           ip-10-201-34-77.eu-west-1.compute.internal/10.201.34.77
Start Time:     Tue, 26 Jun 2018 14:58:36 +0000
Labels:         app=alb-ingress-controller
                pod-template-hash=3234526222
Annotations:    <none>
Status:         Running
IP:             100.113.157.147
Controlled By:  ReplicaSet/alb-ingress-controller-767896b666
Containers:
  server:
    Container ID:  docker://5ef7d7a62cf9504154882c3d2b9ef858e140733626064f42aac812d6389cde6c
    Image:         quay.io/coreos/alb-ingress-controller:1.0-beta.1
    Image ID:      docker-pullable://quay.io/coreos/alb-ingress-controller@sha256:91edb05491600014625a551945d438bdb246f6e154ade80e9ad3ccddc00197a4
    Port:          <none>
    Args:
      /server
      --default-backend-service=kube-system/default-http-backend
    State:          Running
      Started:      Tue, 26 Jun 2018 16:01:50 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Tue, 26 Jun 2018 16:00:23 +0000
      Finished:     Tue, 26 Jun 2018 16:00:23 +0000
    Ready:          True
    Restart Count:  5
    Environment:
      AWS_REGION:       eu-west-1
      CLUSTER_NAME:     bm-kubernetes-test-lqhnqcka.internal
      AWS_DEBUG:        true
      AWS_MAX_RETRIES:  20
      POD_NAME:         alb-ingress-controller-767896b666-xrlvw (v1:metadata.name)
      POD_NAMESPACE:    kube-system (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4fxhn (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  default-token-4fxhn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4fxhn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                From                                                 Message
  ----     ------   ----               ----                                                 -------
  Normal   Pulling  48m (x5 over 1h)   kubelet, ip-10-201-34-77.eu-west-1.compute.internal  pulling image "quay.io/coreos/alb-ingress-controller:1.0-beta.1"
  Normal   Pulled   48m (x5 over 1h)   kubelet, ip-10-201-34-77.eu-west-1.compute.internal  Successfully pulled image "quay.io/coreos/alb-ingress-controller:1.0-beta.1"
  Normal   Created  48m (x5 over 1h)   kubelet, ip-10-201-34-77.eu-west-1.compute.internal  Created container
  Normal   Started  48m (x5 over 1h)   kubelet, ip-10-201-34-77.eu-west-1.compute.internal  Started container
  Warning  BackOff  48m (x9 over 50m)  kubelet, ip-10-201-34-77.eu-west-1.compute.internal  Back-off restarting failed container

This is the newly started one:

ubuntu@ip-10-202-13-63:~$ kubectl describe pod alb-ingress-controller -n kube-system
Name:           alb-ingress-controller-69c9b847c5-vqfzg
Namespace:      kube-system
Node:           ip-10-202-66-89.eu-west-1.compute.internal/10.202.66.89
Start Time:     Tue, 26 Jun 2018 16:00:32 +0000
Labels:         app=alb-ingress-controller
                pod-template-hash=2575640371
Annotations:    <none>
Status:         Running
IP:             100.100.118.5
Controlled By:  ReplicaSet/alb-ingress-controller-69c9b847c5
Containers:
  server:
    Container ID:  docker://c16623bd7e745f70dbf0bb31d4b03a322f89b8053cb8010ebefaa274f826783b
    Image:         quay.io/coreos/alb-ingress-controller:1.0-beta.1
    Image ID:      docker-pullable://quay.io/coreos/alb-ingress-controller@sha256:91edb05491600014625a551945d438bdb246f6e154ade80e9ad3ccddc00197a4
    Port:          <none>
    Args:
      /server
      --default-backend-service=kube-system/default-http-backend
    State:          Running
      Started:      Tue, 26 Jun 2018 16:00:37 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      AWS_REGION:       eu-west-1
      CLUSTER_NAME:     bm-kubernetes-live.internal
      AWS_DEBUG:        false
      AWS_MAX_RETRIES:  20
      POD_NAME:         alb-ingress-controller-69c9b847c5-vqfzg (v1:metadata.name)
      POD_NAMESPACE:    kube-system (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qplzw (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  default-token-qplzw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qplzw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From                                                 Message
  ----    ------                 ----  ----                                                 -------
  Normal  Scheduled              49m   default-scheduler                                    Successfully assigned alb-ingress-controller-69c9b847c5-vqfzg to ip-10-202-66-89.eu-west-1.compute.internal
  Normal  SuccessfulMountVolume  49m   kubelet, ip-10-202-66-89.eu-west-1.compute.internal  MountVolume.SetUp succeeded for volume "default-token-qplzw"
  Normal  Pulling                49m   kubelet, ip-10-202-66-89.eu-west-1.compute.internal  pulling image "quay.io/coreos/alb-ingress-controller:1.0-beta.1"
  Normal  Pulled                 49m   kubelet, ip-10-202-66-89.eu-west-1.compute.internal  Successfully pulled image "quay.io/coreos/alb-ingress-controller:1.0-beta.1"
  Normal  Created                49m   kubelet, ip-10-202-66-89.eu-west-1.compute.internal  Created container
  Normal  Started                49m   kubelet, ip-10-202-66-89.eu-west-1.compute.internal  Started container


The cluster name is limited to 11 characters so these are effectively the same cluster name. I'll open a PR later that exits with an error if the cluster name is longer than 11 characters to prevent this kind of hidden problem in the future.

Hmm. Thank you. It is a new information. I thought that the names are limnited to 253 chars according to the official documentation: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/
Where does the "11 char" limitation can be found explicite? In docs?
I already have clusters deployed with longer, FQDN names, so can we bypass this limit?

It is an "evil" bug or undocumented feature. It can delete running resources accidentally.
It should LOG somewhere what does it use internally as a CLUSTER_NAME to identify the resources.

I did some digging and we used to throw and error but stopped in #223.

The 11 character limitation is because its used as a prefix for naming resources in AWS. We needed to do this to reassemble the controller state at start up and AWS didn't provide a method for querying ALBs via tags at the time. There is an API for that now, so we don't need to keep a naming convention, but i'm unsure of how I can gracefully migrate existing users to the new API without impacting their existing resources. We have namespace and ingress as tags that I can search on, but not the cluster name. I added the cluster name tag recently.

I believe its a limitation on the length of ALB names. It needs to include the cluster name, namespace, and ingress name which means theres some truncation that may need to happen.

https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/de3eb414df8453ab0549c41df869d705e80be546/pkg/controller/alb-controller.go#L448-L458

Thank you. It should put a "warning" log entry at least after the if condition and truncation, because nowhere happens that the controller uses shorter name to identify the correct resource. Or, when you get the full data from AWS SDK.
What is the limitation agains using filering resources according the Tag full data received?
I mean, I have in LB Tags:

kubernetes.io/cluster/bm-kubernetes-test-lqhnqcka.internal | owned

Can you got it when listing resources?

What I would like to do use query the tagging API for resources with the kubernetes.io/cluster/<cluster name> tag and use those to build the state. The issue is all of the existing users who don't have that tag on their resources.

What I am thinking right now is supporting both methods in the 1.0 release and then removing the legacy ALB name searching method in the following release.

Would it be complicated to use the new API methind in a branch? I need to deploy the clusters the day after tomorrow with the already agreed naming convention, the FQDNs, so it would be pity to modify everything because of the ALB controller plugin cant handle just 11 long names :(

Hi Tobi,
Can you please share your alb-ingress-controller.yaml file? I would like to see how you mounted the volume with the token. I believe you are doing this on EKS?

Normal SuccessfulMountVolume 49m kubelet, ip-10-202-66-89.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-qplzw"

Here is it, nothing special, copy-paster from the examples:

# Application Load Balancer (ALB) Ingress Controller Deployment Manifest.
# This manifest details sensible defaults for deploying an ALB Ingress Controller.
# GitHub: https://github.com/coreos/alb-ingress-controller
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: alb-ingress-controller
  name: alb-ingress-controller
  # Namespace the ALB Ingress Controller should run in. Does not impact which
  # namespaces it's able to resolve ingress resource for. For limiting ingress
  # namespace scope, see --watch-namespace.
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alb-ingress-controller
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: alb-ingress-controller
    spec:
      containers:
      - args:
        - /server
        # Ingress controllers must have a default backend deployment where
        # all unknown locations can be routed to. Often this is a 404 page. The
        # default backend is not particularly helpful to the ALB Ingress Controller
        # but is still required. The default backend and its respective service 
        # must be running Kubernetes for this controller to start.
        - --default-backend-service=kube-system/default-http-backend
        # Limit the namespace where this ALB Ingress Controller deployment will
        # resolve ingress resources. If left commented, all namespaces are used.
        #- --watch-namespace=your-k8s-namespace
        # Setting the ingress-class flag below will ensure that only ingress resources with the
        # annotation kubernetes.io/ingress.class: "alb" are respected by the controller. You may
        # choose any class you'd like for this controller to respect.
        #- --ingress-class=alb
        env:
          # AWS region this ingress controller will operate in.
          # List of regions:
          # http://docs.aws.amazon.com/general/latest/gr/rande.html#vpc_region 
        - name: AWS_REGION
          value: eu-west-1
          # Name of your cluster. Used when naming resources created
          # by the ALB Ingress Controller, providing distinction between
          # clusters.
        - name: CLUSTER_NAME
          value: bm-kubernetes-test-lqhnqcka.internal
          # AWS key id for authenticating with the AWS API.
          # This is only here for examples. It's recommended you instead use
          # a project like kube2iam for granting access.
        #- name: AWS_ACCESS_KEY_ID
           #value: KEYVALUE
          # AWS key secret for authenticating with the AWS API.
          # This is only here for examples. It's recommended you instead use
          # a project like kube2iam for granting access.
        #- name: AWS_SECRET_ACCESS_KEY
           #value: SECRETVALUE
          # Enables logging on all outbound requests sent to the AWS API.
          # If logging is desired, set to true.
        - name: AWS_DEBUG
          value: "true"
          # Maximum number of times to retry the aws calls.
          # defaults to 20.
        - name: AWS_MAX_RETRIES
          value: "20"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        # Repository location of the ALB Ingress Controller.
        image: quay.io/coreos/alb-ingress-controller:1.0-beta.1
        imagePullPolicy: Always
        name: server
        resources: {}
        terminationMessagePath: /dev/termination-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30

Thanks,
I did follow the same steps from https://github.com/raf-d/aws-alb-controller/blob/master/RBAC-for-aws-controller.md but I am still getting the following error.

✖ It seems the cluster it is running with Authorization enabled (like RBAC) and there is no permissions for the ingress controller. Please check the configuration

@akhilkathuria I guess you should post your problem into an another issue or thread, because it is about rights not configuration parameters.

@akhilkathuria I also faced the same issue but following below link resolved the issue.

https://github.com/raf-d/aws-alb-controller/blob/master/RBAC-for-aws-controller.md

Basically
Create the ServiceAccount, ClusterRole and ClusterRoleBinding to use with the ALB controller. and udape in ur alb-ingress-controller.yaml file.

@tatobi I just opened #428 which creates the --albNamePrefix parameter, that way you should be able to use the clusterName that you prefer and have an albNamePrefix that is different. Let me know how it goes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gigi-at-zymergen picture gigi-at-zymergen  Â·  5Comments

amalagaura picture amalagaura  Â·  4Comments

mgoodness picture mgoodness  Â·  5Comments

rootd00d picture rootd00d  Â·  4Comments

jwickens picture jwickens  Â·  4Comments