Aws-load-balancer-controller: Walkthrough doesn't work! no ALB created, empty logs, no pod created

Created on 9 Aug 2019 · 13Comments · Source: kubernetes-sigs/aws-load-balancer-controller

I was trying to update my configuration, and the changes weren't applying to the ALB (See #898). So I decided to delete the load balancer and start over.

I deleted the alb-ingress-controller Deployment, the ingress, and the load balancer from AWS console, then went back to the walkthrough to try to get it to work.

Following the walkthrough, I get to Deploy Ingress for echoserver but it does not complete. No load balancer is created in the aws console, and the log output is empty.

$ kube apply -f echoserver-ingress.yaml
ingress.extensions/echoserver created

$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o 'alb-ingress[a-zA-Z0-9-]+') | grep 'echoserver\/echoserver'

The log command appears to be looking for a pod with echoserver in the name? None exist:

$ kubectl get pods -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
alb-ingress-controller-795b5558d8-54j2p   1/1     Running   0          7m
aws-node-vv8bs                            1/1     Running   0          161d
aws-node-z7j6t                            1/1     Running   0          161d
aws-node-zgv57                            1/1     Running   0          161d
kube-proxy-8xgjc                          1/1     Running   0          161d
kube-proxy-cpqnk                          1/1     Running   0          161d
kube-proxy-rxzsg                          1/1     Running   0          161d
kubernetes-dashboard-5dd89b9875-fm2hx     1/1     Running   0          161d

What's going on? What can I do to get this project to work at all?

Thanks,
Sean

lifecyclrotten

Source

seanhess

Most helpful comment

Your problem is here:

Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout

It's telling you the container running on that node can't connect to https://elasticloadbalancing.us-east-2.amazonaws.com/ because the request is timing out. If you fire up a random container on the same instance and kubectl exec into that container, I'd bet you can't run telnet elasticloadbalancing.us-east-2.amazonaws.com 443 because the request times out. This is likely because your egress on your Node isn't allowing connections to elasticloadbalancing.us-east-2.amazonaws.com.

fimbulvetr on 15 Aug 2019

👍2

All 13 comments

I just tried doing the walkthrough with a fresh EKS cluster and it is broken at the same place.

seanhess on 9 Aug 2019

Here's the output of the log command without the grep for echoserver. This is equivalent to the logs from the previous step. So it doesn't appear to have done anything at all when applying the config for echoserver-ingress.yaml

$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o 'alb-ingress[a-zA-Z0-9-]+')
-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.1.2
  Build:      git-cc1c5971
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------

W0809 17:28:54.686347       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0809 17:28:54.725386       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0809 17:28:54.725619       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0809 17:28:54.725700       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0809 17:28:54.725872       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0809 17:28:54.725934       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0809 17:28:54.726060       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0809 17:28:54.726305       1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0809 17:28:54.726521       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/ingress-controller-leader-alb...
I0809 17:28:54.735975       1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0809 17:28:54.836302       1 :0] kubebuilder/controller "level"=0 "msg"="Starting Controller"  "controller"="alb-ingress-controller"
I0809 17:28:54.936439       1 :0] kubebuilder/controller "level"=0 "msg"="Starting workers"  "controller"="alb-ingress-controller" "worker count"=1

seanhess on 9 Aug 2019

👍2

After waiting several days, the alb load balancer pod has this in the logs every 5 minutes or so.

E0812 06:13:22.253653       1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to RequestError: send request failed\ncaused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"ingress"}

seanhess on 12 Aug 2019

👍2

It looks like this issue is similar to #771. I destroyed the alb ingress controller and the ingress and re-created them using the --aws-api-debug flag. now I'm seeing this in the logs in an endless loop:

I0812 16:39:22.624798       1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: {  Names: ["d292306e-default-ingress-e8c7"]}
E0812 16:39:52.625093       1 request_pagination.go:105] Failed request: elasticloadbalancing/DescribeLoadBalancers, Payload: {  Names: ["d292306e-default-ingress-e8c7"]}, Error: RequestError: send request failed
caused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout
E0812 16:39:52.625160       1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to RequestError: send request failed\ncaused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"ingress"}
I0812 16:39:53.625499       1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: {  Names: ["d292306e-default-ingress-e8c7"]}
I0812 16:40:23.684990       1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: {  Names: ["d292306e-default-ingress-e8c7"]}
I0812 16:40:53.751517       1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: {  Names: ["d292306e-default-ingress-e8c7"]}

Note that I have both the subnets configured in ingress.yaml, and the tags on the subnets.

seanhess on 12 Aug 2019

👍1

Here's my full configuration:

alb-ingress-controller.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
  # Namespace the ALB Ingress Controller should run in. Does not impact which
  # namespaces it's able to resolve ingress resource for. For limiting ingress
  # namespace scope, see --watch-namespace.
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: alb-ingress-controller
  template:
    metadata:
      labels:
        app.kubernetes.io/name: alb-ingress-controller
    spec:
      containers:
        - name: alb-ingress-controller
          args:
            # Limit the namespace where this ALB Ingress Controller deployment will
            # resolve ingress resources. If left commented, all namespaces are used.
            # - --watch-namespace=your-k8s-namespace

            # Setting the ingress-class flag below ensures that only ingress resources with the
            # annotation kubernetes.io/ingress.class: "alb" are respected by the controller. You may
            # choose any class you'd like for this controller to respect.
            - --ingress-class=alb

            # Name of your cluster. Used when naming resources created
            # by the ALB Ingress Controller, providing distinction between
            # clusters.
            - --cluster-name=timely

            # AWS VPC ID this ingress controller will use to create AWS resources.
            # If unspecified, it will be discovered from ec2metadata.
            # - --aws-vpc-id=vpc-xxxxxx

            # AWS region this ingress controller will operate in. 
            # If unspecified, it will be discovered from ec2metadata.
            # List of regions: http://docs.aws.amazon.com/general/latest/gr/rande.html#vpc_region
            # - --aws-region=us-west-1

            # Enables logging on all outbound requests sent to the AWS API.
            # If logging is desired, set to true.
            - --aws-api-debug
            # Maximum number of times to retry the aws calls.
            # defaults to 10.
            # - --aws-max-retries=10
          env:
            # AWS key id for authenticating with the AWS API.
            # This is only here for examples. It's recommended you instead use
            # a project like kube2iam for granting access.
            # - name: AWS_ACCESS_KEY_ID
            #   value: xxx 

            # AWS key secret for authenticating with the AWS API.
            # This is only here for examples. It's recommended you instead use
            # a project like kube2iam for granting access.
            # - name: AWS_SECRET_ACCESS_KEY
            #   value: xxx
          # Repository location of the ALB Ingress Controller.
          image: docker.io/amazon/aws-alb-ingress-controller:v1.1.2
      serviceAccountName: alb-ingress-controller

ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: default
  name: ingress
  annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internet-facing

      # directs traffic directly to node
      alb.ingress.kubernetes.io/target-type: ip

      alb.ingress.kubernetes.io/subnets: subnet-0008206f56202428a,subnet-03022082739f3516f,subnet-0c7851c425e597d03
      # alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
      alb.ingress.kubernetes.io/group: timely
      alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:575625783736:certificate/f0dce948-18f5-495b-a4c3-233cf39281b0
      alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
      alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
spec:
  rules:
    - host: echo.timelyadvance.com
      http:
        paths:
          - path: /
            backend:
              serviceName: echo1
              servicePort: 80

     # test.timelyadvance.com and anything will redirect here, as it has no host
    - http:
        paths:
         - path: /*
           backend:
             serviceName: ssl-redirect
             servicePort: use-annotation
         - path: /echo2/*
           backend:
             serviceName: echo1
             servicePort: 80
         - path: /v1/*
           backend:
             serviceName: api
             servicePort: 80
         - path: /health
           backend:
             serviceName: api
             servicePort: 80
         - path: /debug
           backend:
             serviceName: api
             servicePort: 80
         - path: /*
           backend:
             serviceName: web-static
             servicePort: 80

seanhess on 12 Aug 2019

Your problem is here:

Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout

fimbulvetr on 15 Aug 2019

👍2

Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout
This could also be due to inability to resolve elasticloadbalancing.us-east-2.amazonaws.com and in turn connect to this URL.

samdd on 7 Nov 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 5 Feb 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 6 Mar 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 5 Apr 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.