May be related to #223
When creating an ingress resource (via alb-ingress-controller), we sometimes get an error where the RecordSet maps to an IP address.
time="2017-12-13T19:12:49Z" level=info msg="Changing records: CREATE {
Action: "CREATE",
ResourceRecordSet: {
Name: "test.example.com",
ResourceRecords: [{
Value: "172.17.3.177"
}],
TTL: 300,
Type: "A"
}
} ..."
time="2017-12-13T19:12:49Z" level=info msg="Changing records: CREATE {
Action: "CREATE",
ResourceRecordSet: {
Name: "test.example.com",
ResourceRecords: [{
Value: "\"heritage=external-dns,external-dns/owner=my-identifier\""
}],
TTL: 300,
Type: "TXT"
}
} ..."
After that, external-dns seems to be trying to update that record:
time="2017-12-13T19:19:50Z" level=info msg="Changing records: UPSERT {
Action: "UPSERT",
ResourceRecordSet: {
Name: "test.example.com",
ResourceRecords: [{
Value: "xxx-yyy-afc4-111847805.us-east-1.elb.amazonaws.com"
}],
TTL: 300,
Type: "A"
}
} ..."
time="2017-12-13T19:19:50Z" level=error msg="InvalidChangeBatch: Invalid Resource Record: FATAL problem: ARRDATAIllegalIPv4Address (Value is not a valid IPv4 address) encountered with 'xxx-yyy-afc4-111847805.us-east-1.elb.amazonaws.com'
status code: 400, request id: 96f84f28-e03a-11e7-85e7-1927c4da9ba3"
time="2017-12-13T19:20:50Z" level=info msg="Changing records: UPSERT {
Action: "UPSERT",
ResourceRecordSet: {
Name: "test.example.com",
ResourceRecords: [{
Value: "xxx-yyy-afc4-111847805.us-east-1.elb.amazonaws.com"
}],
TTL: 300,
Type: "A"
}
} ..."
I tried to get my ingress resource:
$ kubectl get ing
NAME HOSTS ADDRESS PORTS AGE
test-ingress test.example.com xxx-yyy-afc4-... 80 18m
If I into Route 53 and delete the A record pointing to 172.17.3.177 and the TXT record, then external-dns will correctly create the ALIAS record to my ALB and all seems well.
Note, the DNS names and ALB names were changed above.
Versions:
kubernetes v1.7.10
alb-ingress-controller v1.0-alpha.3
external-dns v0.4.2
@twang-rs please do kubectl get ingress blah -o yaml and paste output here
That's strange.
@twang-rs @ksindi Can you please closely monitor the ADDRESS column of your test-ingress while this is happening? Or even better, for more details, like @ideahitme suggests via
kubectl get ing test-ingress -o json | jq .status.loadBalancer.ingress
It seems the field is populated with some IP before being populated with the ELB cname. In that case it would be an issue somewhere else. Please also paste your full ingress definition.
@linki it only happened to me once and I can't reproduce anymore. Will let you now if I see it happening again.
$ kubectl get ing alertmanager-system -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:xxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80,"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internet-facing
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"alb.ingress.kubernetes.io/certificate-arn":"arn:aws:acm:us-east-1:xxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","alb.ingress.kubernetes.io/listen-ports":"[{\"HTTP\": 80,\"HTTPS\": 443}]","alb.ingress.kubernetes.io/scheme":"internet-facing"},"name":"alertmanager-system","namespace":"prometheus-system"},"spec":{"rules":[{"host":"test.example.com","http":{"paths":[{"backend":{"serviceName":"alertmanager-system","servicePort":9093},"path":"/"}]}}]}}
creationTimestamp: 2018-01-02T21:28:32Z
generation: 1
name: alertmanager-system
namespace: prometheus-system
resourceVersion: "5482879"
selfLink: /apis/extensions/v1beta1/namespaces/prometheus-system/ingresses/alertmanager-system
uid: e1d5318f-f003-11e7-8638-0ef1ee2132d2
spec:
rules:
- host: test.example.com
http:
paths:
- backend:
serviceName: alertmanager-system
servicePort: 9093
path: /
status:
loadBalancer:
ingress:
- hostname: xxxyyyyzzzz-prometheussyst-afc4-xxxxxxxx.us-east-1.elb.amazonaws.com
So, polling kubectl get ing blah -o json | jq .status.loadBalancer.ingress, I do see that as I create and destroy the ingress resource, occassionally, it would show in IP address:
#!/bin/bash
catch()
{
eval "$({
__2="$(
{ __1="$("${@:3}")"; } 2>&1;
ret=$?;
printf '%q=%q\n' "$1" "$__1" >&2;
exit $ret
)"
ret="$?";
printf '%s=%q\n' "$2" "$__2" >&2;
printf '( exit %q )' "$ret" >&2;
} 2>&1 )";
}
q() {
kubectl get ing alertmanager-system -o json | jq .status.loadBalancer.ingress
}
LAST=
NL=0
while true; do
catch OUT ERR q
RESP="${OUT}${ERR}"
if [ "$RESP" != "$LAST" ]; then
if [ "$NL" -eq 1 ]; then
NL=0
echo
fi
echo $RESP
else
NL=1
echo -n .
fi
LAST=$RESP
sleep 0.5
done
Error from server (NotFound): ingresses.extensions "alertmanager-system" not found
.......
null
...................
[ { "hostname": "xxxyyyyzzzz-prometheussyst-afc4-1111111111.us-east-1.elb.amazonaws.com" } ]
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Error from server (NotFound): ingresses.extensions "alertmanager-system" not found
...............................................
null
.....
[ { "hostname": "xxxyyyyzzzz-prometheussyst-afc4-1111111111.us-east-1.elb.amazonaws.com" } ]
.....................................
[ { "hostname": "xxxyyyyzzzz-prometheussyst-afc4-2222222222.us-east-1.elb.amazonaws.com" } ]
.............................................................................................................................................................................
Error from server (NotFound): ingresses.extensions "alertmanager-system" not found
.........................................................................
null
..
[ { "ip": "172.17.3.177" } ]
.....................................
[ { "hostname": "xxxyyyyzzzz-prometheussyst-afc4-3333333333.us-east-1.elb.amazonaws.com" } ]
.................................................................................................................................................
In the last case above, external-dns didn't create records until the ingress resource changed from ip to hostname, therefore the records created were correct (A records to the DNS name of the ELB).
Also note, that one time the hostname was actually the name of the previous ELB and then updated to the new ELB (1111111111 -> 2222222222). I'm pretty sure the ELB had been fully deprovisioned from the previous delete before I re-added the ingress resource.
So, it seems that it can be racy in terms of when external-dns wakes up to make the changes, versus whatever is setting the address of the load balancer (I assume alb-ingress-controller, in my case).
Thanks @twang-rs for the detailed log.
To me this seems to be an issue with your ingress controller. ExternalDNS never modifies any Ingress objects. It only reads several attributes including .status.loadBalancer.ingress to construct the desired DNS records.
Switching between DNS record types isn't supported nor is having multiple values in the .status.loadBalancer.ingress field (I think you have two values in there in your first post). This will lead to ExternalDNS printing several errors.
However, the underlying problem seems to be coming from the ingress controller putting those fluctuating values in the .status.loadBalancer.ingress field. I'm not sure if that's expected behaviour, though. If it is ExternalDNS should handle it better.
i had the same issue, it occurred to me when I first deployed nginx-ingress without publishing the service (so external-dns created an A record to an in internal IP) and the update the nginx-ingress configuration to publish the service. by removing manually the old entry in route 53, everything when fine.
This is occurring with me as well. It happens which you don't enable 'publish service' for the nginx ingress controller. The Ingress endpoint will get set instead to the node IP of the nginx controller pod. When you enable 'publish service', the ingress endpoint is updated to the ALB hostname, but when external DNS attempts an upsert, it fails because it does not create the records as ALIAS.
@jrthrawny Thank you for that! Adding the correct publish-service option to my Nginx daemon set solved my issues too.
@dieterrosch what did you set it to?
I'm having a very similar issue using alb-ingress-controller with external-dns.
I don't have a specific kubernetes service associated with ingress, so I'm not sure what to set it to.
Here is part of the JSON definition of my Nginx Daemonset:
"spec": {
"containers": [
{
"name": "nginx",
"image": "gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.11",
"args": [
"/nginx-ingress-controller",
"--default-backend-service=nginx-ingress/default-http-backend",
"--configmap=nginx-ingress/nginx",
"--tcp-services-configmap=nginx-ingress/tcp-ports",
"--publish-service=$(POD_NAMESPACE)/nginx"
],
I needed:
"--publish-service=$(POD_NAMESPACE)/nginx"
thanks for following up.
I guess I'm confused about what service that namespace/nginx is.
my ingress is a deployment without a service.
I have lots of other kubernetes services that are exposed via ingress, not sure which service I'd point towards here.
My NGinx pods are deployed in a namespace named nginx-ingress.
In that same namespace I have two services named nginx and default-http-backend.
All of these were created by Kops for me (This cluster is running on AWS).
The ingress does not need a service. It looks like this:
Outside internet -> Ingress -> Nginx Service -> Nginx Pods.
You would point it to the service that is sitting in front of your nginx containers, ie the service that load balances calls to your nginx.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Here is part of the JSON definition of my Nginx Daemonset:
"spec": { "containers": [ { "name": "nginx", "image": "gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.11", "args": [ "/nginx-ingress-controller", "--default-backend-service=nginx-ingress/default-http-backend", "--configmap=nginx-ingress/nginx", "--tcp-services-configmap=nginx-ingress/tcp-ports", "--publish-service=$(POD_NAMESPACE)/nginx" ],I needed:
"--publish-service=$(POD_NAMESPACE)/nginx"
has anyone found the solution to this for alb-ingress-controller (not using nginx-ingress controller)? external-dns pointing to the internal IPs instead of the published alb DNS name occurs frequently but doesn't always happen.
i've confirmed that the alb DNS name exists from:
kubectl get ingress -o json | jq .status.loadBalancer
@jtai-omniex Please check that the value in kubectl get ingress -o json | jq .status.loadBalancer doesn't change in a similar way. Normally ExternalDNS only reads that value and the ingress controller actually causes the changes.
Most helpful comment
@dieterrosch what did you set it to?
I'm having a very similar issue using alb-ingress-controller with external-dns.
I don't have a specific kubernetes service associated with ingress, so I'm not sure what to set it to.