At beta.2 the alb-ingress-controller deployed to China regions fails with errors related to certain services not being available in AWS in China.
Example
E0628 14:11:42.981304 1 session.go:44] [ALB-INGRESS] [session] [ERROR]: caused by: Post https://waf-regional.cn-north-1.amazonaws.com.cn/: dial tcp: lookup waf-regional.cn-north-1.amazonaws.com.cn on 100.64.0.10:53: no such host
F0628 14:11:42.981326 1 albingresses.go:161] [ALB-INGRESS] [ingress] [ERROR]: Failed to get associated WAF ACL. Error: RequestError: send request failed
This also causes issues when alb-ingress-controller attempts to call acm/DescribeCertificate, as acm is another service not available there.
So the services in use by alb-ingress-controller which are failing are:
acm
waf
waf-regional
We are deploying alb-ingress-controller like so:
````yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: alb-ingress-controller
name: alb-ingress-controller
# Namespace the ALB Ingress Controller should run in. Does not impact which
# namespaces it's able to resolve ingress resource for. For limiting ingress
# namespace scope, see --watch-namespace.
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: alb-ingress-controller
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: alb-ingress-controller
annotations:
iam.amazonaws.com/role: arn:aws-cn:iam::123412341234:role/alb-ingress-controller_kube2iam
spec:
containers:
- args:
- /server
# Ingress controllers must have a default backend deployment where
# all unknown locations can be routed to. Often this is a 404 page. The
# default backend is not particularly helpful to the ALB Ingress Controller
# but is still required. The default backend and its respective service
# must be running Kubernetes for this controller to start.
- --default-backend-service=kube-system/default-http-backend
# Limit the namespace where this ALB Ingress Controller deployment will
# resolve ingress resources. If left commented, all namespaces are used.
#- --watch-namespace=your-k8s-namespace
# Setting the ingress-class flag below will ensure that only ingress resources with the
# annotation kubernetes.io/ingress.class: "alb" are respected by the controller. You may
# choose any class you'd like for this controller to respect.
#- --ingress-class=alb
env:
# AWS region this ingress controller will operate in.
# List of regions:
# http://docs.aws.amazon.com/general/latest/gr/rande.html#vpc_region
- name: AWS_REGION
value: cn-north-1
# Name of your cluster. Used when naming resources created
# by the ALB Ingress Controller, providing distinction between
# clusters.
- name: CLUSTER_NAME
value: foo-china-1.k8s.local
# AWS key id for authenticating with the AWS API.
# This is only here for examples. It's recommended you instead use
# a project like kube2iam for granting access.
#- name: AWS_ACCESS_KEY_ID
#value: KEYVALUE
# AWS key secret for authenticating with the AWS API.
# This is only here for examples. It's recommended you instead use
# a project like kube2iam for granting access.
#- name: AWS_SECRET_ACCESS_KEY
#value: SECRETVALUE
# Enables logging on all outbound requests sent to the AWS API.
# If logging is desired, set to true.
- name: AWS_DEBUG
value: "true"
# Maximum number of times to retry the aws calls.
# defaults to 20.
- name: AWS_MAX_RETRIES
value: "3"
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
# Repository location of the ALB Ingress Controller.
image: 123412341234.dkr.ecr.cn-north-1.amazonaws.com.cn/coreos/alb-ingress-controller:1.0-beta.1
imagePullPolicy: Always
name: server
resources: {}
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
serviceAccountName: alb-service-account
terminationGracePeriodSeconds: 30
````
Startup
I0628 14:11:42.497278 1 main.go:24] [ALB-INGRESS] [controller] [INFO]: Log level read as "", defaulting to INFO. To change, set LOG_LEVEL environment variable to WARN, ERROR, or DEBUG.
I0628 14:11:42.497639 1 launch.go:112] &{ALB Ingress Controller 1.0-alpha.9 git-11cdb4a8 git://github.com/kubernetes-sigs/aws-alb-ingress-controller}
I0628 14:11:42.497863 1 launch.go:282] Creating API client for https://100.64.0.1:443
I0628 14:11:42.506308 1 launch.go:295] Running in Kubernetes Cluster version v1.9 (v1.9.8) - git (clean) commit c138b85178156011dc934c2c9f4837476876fb07 - platform linux/amd64
I0628 14:11:42.507846 1 launch.go:134] validated kube-system/default-http-backend as the default backend
I0628 14:11:42.511812 1 albingresses.go:85] [ALB-INGRESS] [ingress] [INFO]: Building list of existing ALBs
I0628 14:11:42.514214 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: elasticloadbalancing/DescribeLoadBalancers, Payload: {}
I0628 14:11:42.592825 1 albingresses.go:94] [ALB-INGRESS] [ingress] [INFO]: Fetching information on 1 ALBs
I0628 14:11:42.593056 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: elasticloadbalancing/DescribeLoadBalancerAttributes, Payload: { LoadBalancerArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}
I0628 14:11:42.611399 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: waf-regional/GetWebACLForResource, Payload: { ResourceArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}
I0628 14:11:42.660569 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: waf-regional/GetWebACLForResource, Payload: { ResourceArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}
I0628 14:11:42.736671 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: waf-regional/GetWebACLForResource, Payload: { ResourceArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}
I0628 14:11:42.976811 1 session.go:35] [ALB-INGRESS] [session] [INFO]: Request: waf-regional/GetWebACLForResource, Payload: { ResourceArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}
E0628 14:11:42.981292 1 session.go:44] [ALB-INGRESS] [session] [ERROR]: Failed request: waf-regional/GetWebACLForResource, Payload: { ResourceArn: "arn:aws-cn:elasticloadbalancing:cn-north-1:123412341234:loadbalancer/app/foochina1k8-evelauncherdev-e3b4/b692450cb35591b5"}, Error: RequestError: send request failed
E0628 14:11:42.981304 1 session.go:44] [ALB-INGRESS] [session] [ERROR]: caused by: Post https://waf-regional.cn-north-1.amazonaws.com.cn/: dial tcp: lookup waf-regional.cn-north-1.amazonaws.com.cn on 100.64.0.10:53: no such host
F0628 14:11:42.981326 1 albingresses.go:161] [ALB-INGRESS] [ingress] [ERROR]: Failed to get associated WAF ACL. Error: RequestError: send request failed
caused by: Post https://waf-regional.cn-north-1.amazonaws.com.cn/: dial tcp: lookup waf-regional.cn-north-1.amazonaws.com.cn on 100.64.0.10:53: no such host
Makes sense. There will need to be some flags to disable some features.
I might PR a fix for this, so a couple questions:
Would you rather explicitly flag this using --flags and/or ENVs or maintain a support matrix by AWS region, as I see from #393 that this is affecting non-china regions as well?
Would a migration to go-flags/cobra over flag be appreciated to simplify future feature/functionality flagging be appreciated or should I just use the current flag/os.GetEnv model?
I believe the SDK can dynamically detect which services are supported in each region:
https://aws.amazon.com/blogs/developer/using-the-aws-sdk-for-gos-regions-and-endpoints-metadata/
We could probably use that, disable WAF logic if its not supported in the current region, and log errors if an ingress uses a WAF annotation
Nice find, if we can use that API that would be best.
re: flags, I don't think we can switch since the usptream ingress library is using pflag. You are welcome to try it out, i'm not attached to the current flag implementation.
Seems to be the same in ap-southeast-1 region since there is no WAF yet
E0818 02:22:39.904339 1 albingresses.go:211] Failed to get associated Web ACL. Error: RequestError: send request failed
caused by: Post https://waf-regional.ap-southeast-1.amazonaws.com/: dial tcp: lookup waf-regional.ap-southeast-1.amazonaws.com on 100.64.0.10:53: no such host
Is there any ongoing PR I can watch?
and London region too
caused by: Post https://waf-regional.eu-west-2.amazonaws.com/: dial tcp: lookup waf-regional.eu-west-2.amazonaws.com on 10.4.0.10:53: no such host
@bhegazy : A colleague and I are working on this (Resolving endpoint first, through SDK).
Compiling and testing will be done tomorrow. I hope we can submit a PR by the end of the week.
@Andrea-Politi : the goal is to retrieve information about WAF regarding user's configuration (made in SDK).
@grockeek Thanks.
hi @grockeek. Do we have any updates here?
@YurkoHoshko, @bhegazy : @mikaelrandy just submitted the PR #622 regarding this issue.
Is someone working on the ACM regional availability check? We don't have acm yet in cn-north-1 and cn-northwest-1
Also in GovCloud region it doesn't work
will you be able to test #728 in eu-west-3? I don't have a cluster at hand at test it out 馃槃
The image i built is docker.io/m00nf1sh/alb-ingress-controller:pull-728, you can built it by checkout #728, and update the makefile to
@pahud It's resolved by #728
@carpenterm what's the problem in GovCloud? Is it due to Waf not available? Is so, #728 will solves it
@M00nF1sh looks like https://github.com/kubernetes-sigs/aws-alb-ingress-controller/pull/728 can disable waf support, but can't disable acm support?
@pahud in the latest branch, ACM API is not used(only used in health check and should be removed).
I haven't remove it since we can introduce a feature to automatically discover and bind certificate.(there was an PR for that but got closed)
with #728 , healthy check for ACM is disable when acm is unavailable
Most helpful comment
@bhegazy : A colleague and I are working on this (Resolving endpoint first, through SDK).
Compiling and testing will be done tomorrow. I hope we can submit a PR by the end of the week.
@Andrea-Politi : the goal is to retrieve information about WAF regarding user's configuration (made in SDK).