I was trying to update my configuration, and the changes weren't applying to the ALB (See #898). So I decided to delete the load balancer and start over.
I deleted the alb-ingress-controller Deployment, the ingress, and the load balancer from AWS console, then went back to the walkthrough to try to get it to work.
Following the walkthrough, I get to Deploy Ingress for echoserver but it does not complete. No load balancer is created in the aws console, and the log output is empty.
$ kube apply -f echoserver-ingress.yaml
ingress.extensions/echoserver created
$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o 'alb-ingress[a-zA-Z0-9-]+') | grep 'echoserver\/echoserver'
The log command appears to be looking for a pod with echoserver in the name? None exist:
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
alb-ingress-controller-795b5558d8-54j2p 1/1 Running 0 7m
aws-node-vv8bs 1/1 Running 0 161d
aws-node-z7j6t 1/1 Running 0 161d
aws-node-zgv57 1/1 Running 0 161d
kube-proxy-8xgjc 1/1 Running 0 161d
kube-proxy-cpqnk 1/1 Running 0 161d
kube-proxy-rxzsg 1/1 Running 0 161d
kubernetes-dashboard-5dd89b9875-fm2hx 1/1 Running 0 161d
What's going on? What can I do to get this project to work at all?
Thanks,
Sean
I just tried doing the walkthrough with a fresh EKS cluster and it is broken at the same place.
Here's the output of the log command without the grep for echoserver. This is equivalent to the logs from the previous step. So it doesn't appear to have done anything at all when applying the config for echoserver-ingress.yaml
$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o 'alb-ingress[a-zA-Z0-9-]+')
-------------------------------------------------------------------------------
AWS ALB Ingress controller
Release: v1.1.2
Build: git-cc1c5971
Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------
W0809 17:28:54.686347 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0809 17:28:54.725386 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0809 17:28:54.725619 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0809 17:28:54.725700 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"=
I0809 17:28:54.725872 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0809 17:28:54.725934 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"=
I0809 17:28:54.726060 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0809 17:28:54.726305 1 :0] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0809 17:28:54.726521 1 leaderelection.go:205] attempting to acquire leader lease kube-system/ingress-controller-leader-alb...
I0809 17:28:54.735975 1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0809 17:28:54.836302 1 :0] kubebuilder/controller "level"=0 "msg"="Starting Controller" "controller"="alb-ingress-controller"
I0809 17:28:54.936439 1 :0] kubebuilder/controller "level"=0 "msg"="Starting workers" "controller"="alb-ingress-controller" "worker count"=1
After waiting several days, the alb load balancer pod has this in the logs every 5 minutes or so.
E0812 06:13:22.253653 1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to RequestError: send request failed\ncaused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout" "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"ingress"}
It looks like this issue is similar to #771. I destroyed the alb ingress controller and the ingress and re-created them using the --aws-api-debug flag. now I'm seeing this in the logs in an endless loop:
I0812 16:39:22.624798 1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: { Names: ["d292306e-default-ingress-e8c7"]}
E0812 16:39:52.625093 1 request_pagination.go:105] Failed request: elasticloadbalancing/DescribeLoadBalancers, Payload: { Names: ["d292306e-default-ingress-e8c7"]}, Error: RequestError: send request failed
caused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout
E0812 16:39:52.625160 1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to RequestError: send request failed\ncaused by: Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout" "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"ingress"}
I0812 16:39:53.625499 1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: { Names: ["d292306e-default-ingress-e8c7"]}
I0812 16:40:23.684990 1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: { Names: ["d292306e-default-ingress-e8c7"]}
I0812 16:40:53.751517 1 request_pagination.go:105] Request: elasticloadbalancing/DescribeLoadBalancers, Payload: { Names: ["d292306e-default-ingress-e8c7"]}
Note that I have both the subnets configured in ingress.yaml, and the tags on the subnets.
Here's my full configuration:
alb-ingress-controller.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: alb-ingress-controller
name: alb-ingress-controller
# Namespace the ALB Ingress Controller should run in. Does not impact which
# namespaces it's able to resolve ingress resource for. For limiting ingress
# namespace scope, see --watch-namespace.
namespace: kube-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: alb-ingress-controller
template:
metadata:
labels:
app.kubernetes.io/name: alb-ingress-controller
spec:
containers:
- name: alb-ingress-controller
args:
# Limit the namespace where this ALB Ingress Controller deployment will
# resolve ingress resources. If left commented, all namespaces are used.
# - --watch-namespace=your-k8s-namespace
# Setting the ingress-class flag below ensures that only ingress resources with the
# annotation kubernetes.io/ingress.class: "alb" are respected by the controller. You may
# choose any class you'd like for this controller to respect.
- --ingress-class=alb
# Name of your cluster. Used when naming resources created
# by the ALB Ingress Controller, providing distinction between
# clusters.
- --cluster-name=timely
# AWS VPC ID this ingress controller will use to create AWS resources.
# If unspecified, it will be discovered from ec2metadata.
# - --aws-vpc-id=vpc-xxxxxx
# AWS region this ingress controller will operate in.
# If unspecified, it will be discovered from ec2metadata.
# List of regions: http://docs.aws.amazon.com/general/latest/gr/rande.html#vpc_region
# - --aws-region=us-west-1
# Enables logging on all outbound requests sent to the AWS API.
# If logging is desired, set to true.
- --aws-api-debug
# Maximum number of times to retry the aws calls.
# defaults to 10.
# - --aws-max-retries=10
env:
# AWS key id for authenticating with the AWS API.
# This is only here for examples. It's recommended you instead use
# a project like kube2iam for granting access.
# - name: AWS_ACCESS_KEY_ID
# value: xxx
# AWS key secret for authenticating with the AWS API.
# This is only here for examples. It's recommended you instead use
# a project like kube2iam for granting access.
# - name: AWS_SECRET_ACCESS_KEY
# value: xxx
# Repository location of the ALB Ingress Controller.
image: docker.io/amazon/aws-alb-ingress-controller:v1.1.2
serviceAccountName: alb-ingress-controller
ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: default
name: ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
# directs traffic directly to node
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/subnets: subnet-0008206f56202428a,subnet-03022082739f3516f,subnet-0c7851c425e597d03
# alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
alb.ingress.kubernetes.io/group: timely
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:575625783736:certificate/f0dce948-18f5-495b-a4c3-233cf39281b0
alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
spec:
rules:
- host: echo.timelyadvance.com
http:
paths:
- path: /
backend:
serviceName: echo1
servicePort: 80
# test.timelyadvance.com and anything will redirect here, as it has no host
- http:
paths:
- path: /*
backend:
serviceName: ssl-redirect
servicePort: use-annotation
- path: /echo2/*
backend:
serviceName: echo1
servicePort: 80
- path: /v1/*
backend:
serviceName: api
servicePort: 80
- path: /health
backend:
serviceName: api
servicePort: 80
- path: /debug
backend:
serviceName: api
servicePort: 80
- path: /*
backend:
serviceName: web-static
servicePort: 80
Your problem is here:
Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout
It's telling you the container running on that node can't connect to https://elasticloadbalancing.us-east-2.amazonaws.com/ because the request is timing out. If you fire up a random container on the same instance and kubectl exec into that container, I'd bet you can't run telnet elasticloadbalancing.us-east-2.amazonaws.com 443 because the request times out. This is likely because your egress on your Node isn't allowing connections to elasticloadbalancing.us-east-2.amazonaws.com.
Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeout
This could also be due to inability to resolve elasticloadbalancing.us-east-2.amazonaws.com and in turn connect to this URL.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'm having this exact problem. I also added the - --aws-api-debug flag, but I don't see any different output either.
Does anyone have any suggestion?
For aws-load-balancer-controller, the below flag may be used:
--log-level=debug
Most helpful comment
Your problem is here:
Post https://elasticloadbalancing.us-east-2.amazonaws.com/: dial tcp: i/o timeoutIt's telling you the container running on that node can't connect to https://elasticloadbalancing.us-east-2.amazonaws.com/ because the request is timing out. If you fire up a random container on the same instance and kubectl exec into that container, I'd bet you can't run
telnet elasticloadbalancing.us-east-2.amazonaws.com 443because the request times out. This is likely because your egress on your Node isn't allowing connections to elasticloadbalancing.us-east-2.amazonaws.com.