I just did a rolling update of my cluster which is using external DNS, and when external DNS got evicted and rescheduled it ran a bulk DELETE on everything in Route 53, and then a bulk CREATE. This caused unexpected downtime for applications running in the cluster.
I see there's flag --policy=upsert-only which will disable deletion, but I want full sync enabled without the bulk DELETE in this case. What's the recommended approach here? Use upsert-only and manually clean up deletions once in a while?
Which version of External DNS are you running? We should have good protection against this and we are running a lot of rolling cluster updates every week in Zalando.
This happened on 0.4.4, I have since updated to 0.4.8.
One scenario where this could happen (without any configuration changes) if for any reason API server returned empty list for services/ingresses. However it would be helpful to also know which version of Kubernetes is/was running.
It was during a rolling update from 1.8.3 to 1.8.4 using kops. Three masters, five workers. Kops rolled the masters first, then the nodes. I have a single instance of external-dns running on the masters (only masters have IAM permissions in Route 53). During the update API server should have been up at all times.
It has happened twice after the rolling update, in both cases external-dns did a bulk DELETE and then a bulk CREATE a minute or two later. I'm still troubleshooting, but it appears to be related to modifying my nginx ingress controller. If there's no ingress controller up and running then external-dns appears to delete all the Route53 records. In these cases there has aways been a service type: LoadBalancer, but the ingress controller was unhealthy momentarily while testing out some configuration changes. I will do more testing and report back. Assuming it's related to downtime with the ingress controller, is there a way to avoid that?
@jordanjennings External DNS will only react on changes in the Ingress' status field, i.e. ingress controller downtimes should not make DNS records disappear unless somehow the status field was set to empty.
@jordanjennings This shouldn't happen and we'll help you to get it right. Please closely monitor the ADDRESS field of your Ingresses while doing the update. Please check that the field always contains whatever value you think is right:
nginx-controller v0.9.x with the --publish-service flag then it should be an ELB and not change at all during the update.Please also paste your external-dns and nginx-controller manifests.
@linki I am running two nginx ingress controllers, one for internal facing services and one for external services. Each of them is using --publish-service with a service of type: LoadBalancer. I see the ADDRESS field of the ingress records shows the ELBs, as expected.
Here is the templated manifest for the ingress controllers, they both use the same template just different names and ingress classes:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: {{ .name }}-controller
namespace: nginx-ingress
spec:
replicas: {{ .replicas }}
revisionHistoryLimit: 3
selector:
matchLabels:
k8s-app: {{ .name }}-controller
template:
metadata:
labels:
k8s-app: {{ .name }}-controller
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccountName: nginx-ingress-serviceaccount
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --configmap=$(POD_NAMESPACE)/nginx-ingress-config
- --ingress-class={{ .ingressClass }}
- --publish-service=$(POD_NAMESPACE)/{{ .name }}
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
resources:
limits:
memory: 2Gi
requests:
cpu: 0.1
memory: 256Mi
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- containerPort: 80
- containerPort: 18080
Here's my templated external-dns deployment:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: external-dns
namespace: kube-system
labels:
k8s-addon: external-dns.addons.k8s.io
k8s-app: external-dns
spec:
replicas: 1
selector:
matchLabels:
k8s-app: external-dns
template:
metadata:
labels:
k8s-addon: external-dns.addons.k8s.io
k8s-app: external-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
# For 1.6, we keep the old tolerations in case of a downgrade to 1.5
scheduler.alpha.kubernetes.io/tolerations: '[{"key": "dedicated", "value": "master"}]'
spec:
serviceAccount: external-dns
tolerations:
- key: "node-role.kubernetes.io/master"
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/master: ""
dnsPolicy: Default # Don't use cluster DNS (we are likely running before kube-dns)
hostNetwork: true
containers:
- name: external-dns
image: registry.opensource.zalan.do/teapot/external-dns:v0.4.8
args:
- --source=ingress
- --source=service
- --domain-filter={{ .privateHostedZone }}
- --domain-filter={{ .publicHostedZone }}
- --provider=aws
- --registry=txt
- --txt-owner-id={{ .clusterName }}
resources:
requests:
cpu: 50m
memory: 50Mi
Will report back once I am able to do more testing. If you have any thoughts in the meanwhile let me know.
@linki Ok, just tested another rolling update of the cluster and external-dns did a DELETE for all Route53 entries as soon as the nginx ingress controller was evicted from the worker node it was running on (only a single instance of nginx ingress controller serving these ingress rules). Once it was rescheduled, on the next sync loop external-dns did a CREATE for all records.
At the same time that the nginx ingress controller was evicted, all the ADDRESSes of my ingress records disappeared as well. Would that indicate that this is not an external-dns issue, but an nginx ingress controller issue?
Well, another update: I tried having two replicas of the nginx ingress controller so that during the rolling update at least one would always be running and it didn't solve the issue. The ADDRESS field of the ingress still gets wiped out as soon as one of the nginx ingress replicas gets evicted. Maybe this is a bug in kubernetes itself with ingress? Any input would be appreciated.
@jordanjennings Thanks for getting back to us.
ExternalDNS never modifies any Ingress objects. It only reads several attributes including .status.loadBalancer.ingress (the ADDRESS field) to construct the desired DNS records.
I believe that some other component cleans up the values and therefore ExternalDNS will clean up the DNS entries in Route53. For a quick workaround, to make it work and confirmation of this claim I suggest you run ExternalDNS with the --policy=upsert-only flag. You can switch to --policy=sync later and it will clean up all the unused entries.
It could be that nginx-ingress-controller cleans any values from .status.loadBalancer.ingress when it shuts down. (given that previous versions didn't have the --publish-service functionality.) That would be odd especially when --publish-service is used but I may miss something here. If that's true we should be able to reproduce it without ExternalDNS being deployed so we can take it out of the possible culprits.
@linki Yes, the issue appears to be with the nginx ingress controller, I'll open an issue there. Thank you for the thoughtful replies!
For posterity: the nginx ingress controller actually clears out the ingress status when it shuts down, unless it either finds another ingress controller running or has a flag added to NOT clear out ingress on shutdown. See more in the code here: https://github.com/kubernetes/ingress-nginx/blob/nginx-0.9.0/internal/ingress/status/status.go#L108
The way it determines if there are multiple ingress controllers running is a little flakey, and if you are running a single ingress controller (say in a dev environment) then you'll hit this issue. Even running two ingress controllers I was still hitting this issue depending on the timing of the rolling restart (and I even tried pod anti-affinity to be sure that there would always be an nginx ingress controller up somewhere, but that still didn't solve the problem).
The solution to my problem was to use the semi-undocumented flag --update-status-on-shutdown=false so that when the ingress controller is restarting it doesn't ever clear out the ingress status, and so by extension external DNS will not delete the Route53 records.
@jordanjennings thanks for clarifying!
@jordanjennings Why is it undocumented. It seems quite useful and probably should be the default if someone chooses upsert-only policy instead of sync
@jordanjennings Oh, I get it. It is nginx-ingress argument not external-dns. Maybe introduce similar flag in external-dns so it doesn't remove records of any service when it restarts improperly or is configured improperly?
@sheerun ExternalDNS doesn't remove records on shutdown. nginx-controller removes target addresses rom the Ingress objects when it stops/restarts by default. If ExternalDNS does a sync at this moment it removes the DNS records accordingly.
@linki Maybe a ~1m delay for deleting anything in syncing would help in this case?
@sheerun That would just be punting the issue farther down the field, it would still happen in some cases. And then external-dns would have to be stateful to remember for a minute not to do the update (not sure about the internal workings of external dns but seems like that would not be an insignificant change).
I originally opened a ticket for nginx ingress controller asking for a docs update, but thinking about it now several months later, the documentation might make more sense here with external dns, here's the issue:
https://github.com/kubernetes/ingress-nginx/issues/1877
@linki Would it make sense to add a note here for users of nginx ingress controller that they'll want to use the flag I mentioned above or they might get unexpected DNS downtime?
@jordanjennings It would make perfect sense 馃憤
Most helpful comment
For posterity: the nginx ingress controller actually clears out the ingress status when it shuts down, unless it either finds another ingress controller running or has a flag added to NOT clear out ingress on shutdown. See more in the code here: https://github.com/kubernetes/ingress-nginx/blob/nginx-0.9.0/internal/ingress/status/status.go#L108
The way it determines if there are multiple ingress controllers running is a little flakey, and if you are running a single ingress controller (say in a dev environment) then you'll hit this issue. Even running two ingress controllers I was still hitting this issue depending on the timing of the rolling restart (and I even tried pod anti-affinity to be sure that there would always be an nginx ingress controller up somewhere, but that still didn't solve the problem).
The solution to my problem was to use the semi-undocumented flag
--update-status-on-shutdown=falseso that when the ingress controller is restarting it doesn't ever clear out the ingress status, and so by extension external DNS will not delete the Route53 records.