Ingress-nginx: Incorrect logs and ngx variables when using canary Ingresses

Created on 26 Jul 2019 · 30Comments · Source: kubernetes/ingress-nginx

BUG REPORT

NGINX Ingress controller version: nginx-0.25

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: baremetal
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux kube-a-1 4.18.0-20-generic #21~18.04.1 SMP Thu May 16 11:54:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadm

What happened:
When request is routed to a canary backend, there isn't any indication in ngx variables and, therefore, in logs and in a custom Lua script that I use in a log_by_lua_block.

What you expected to happen:
Proper Ingress name and service name/port should be logged and set in ngx variables when request is routed to a backend that is defined by a canary Ingress.

Anything else we need to know:
The solution seems simple enough: we should provide more information to Lua scripts and modify them a bit. I've opened a PR in which I've attempted a naïve solution to these problems: https://github.com/kubernetes/ingress-nginx/pull/4367.

lifecyclrotten

Source

zuzzas

Most helpful comment

I think it's becoming clear what's the issue: by our canary documentation canary settings gets applied to the ingress _only_ if host and path matches. But you've shown an edge case where that invariant is not hold.

I think fundamental issue is that canary is configured based on host and path combination but when it is actually applied to a request, we use $proxy_upstream_name, which is based on namespace, service and port. This is the problem.

ElvinEfendi on 26 Jul 2019

👍2

All 30 comments

When request is routed to a canary backend, there isn't any indication in ngx variables and, therefore, in logs and in a custom Lua script that I use in a log_by_lua_block.

Have you tried using $proxy_alternative_upstream_name that's introduced in the most recent release? Using that you can identify whether request got proxied to alternative backend and if so which one.

ElvinEfendi on 26 Jul 2019

It does not solve the Ingress name issue. If I create an Ingress I'd like to be certain that it's properly represented in ngx.vars/logs.
It adds a new logic that has to be accounted for in queries in ElasticSearch/Kibana or Grafana/Loki or any other analytical systems that ingest logs. It seems logical to map ngx.vars to upstreams exactly as they are defined in Kubernetes.

zuzzas on 26 Jul 2019

@zuzzas can you specify which ngx.vars you are referring to?

ElvinEfendi on 26 Jul 2019

$ingress_name, $service_name and $service_port

zuzzas on 26 Jul 2019

The way canary configured in ingress-nginx is inherently hacky. Canary is about backend (a.k.a proxy_usptream_name and service and service port), it is not about ingress. The reason why it is configured via Ingress is because that is the only way to do it without introducing a new CRD. Given this I'm not convinced that $ingress_name should change.

For $service_name and $service_port it might make sense to change based on which backend is picked for the request.

ElvinEfendi on 26 Jul 2019

BTW, @ElvinEfendi, adding ing.Name to upstreamName, wich is done in #4367, fixes another huge problem.

Without #4367, if you create ingress a that points to service a-svc, then create a-canary that routes 30% of traffic to a-canary-svc and then create ingressb, that points to a-svc – you will get a completely unexpected result – 30% of requests to b (which has no canary) will also be routed to a-svc-canary.

distol on 26 Jul 2019

@distol For that to happen ingress a and b have to have exactly the same path and host combination for service a-svc. And when that's the case the ingresses get merged into one internally - only one server/location block will be created for them. And whichever the controller sees first, its annotations will be used and the annotations in the second one will be ignored.

This is not an unexpected result.

ElvinEfendi on 26 Jul 2019

Given this I'm not convinced that $ingress_name should change

I was thinking the same way at the beginning, but then convinced myself. But still not 100% sure.

After #4367 upstreams are unique for each ingress, so we can create ingress a pointing to a-svc and ingress a-canary pointing to the same a-svc. Seems not really reasonable scenario, but it can be done. And you can set another load-balancing for the a-canary and it should work. How we can distinguish requests to a and to a-svc?

distol on 26 Jul 2019

@distol For that to happen ingress a and b has to have exactly the same path and host combination for service a-svc. And when that's the case the ingresses get merged into one internally - only one server block will be created for them. And whichever the controller sees first, its annotations will be used and the annotations in the second one will be ignored.

No, they shouldn't.

distol on 26 Jul 2019

No, they shouldn't.

You mean they should not get merged? What behaviour do you expect?

ElvinEfendi on 26 Jul 2019

Sorry, full example (all hapens in ns default with the port 80).

Ingress a:
- host: example.com
- path: /
- backend: a-svc
Ingress a-canary:
- host: example.com
- path: /
- backend: a-canary-svc

Okay. This two will be merged. And upstream with the name default-a-svc-80 will have alternative backend with the name default-a-canary-svc-80.

But then, we add ingress B:

host: completely-different.com
path: /some/path
backend: a-svc

And this host will use upstream default-a-svc-80, which already has alternative backend... and this is the problem! Because canary configuration for one ingress affects completely unrelated ingress.

distol on 26 Jul 2019

Ugh I see. That's because $proxy_upstream_name will be same for both example.com requests as well as completely-different.com reqests. The way we choose alternative backend in balancer.lua is based on $proxy_upstream_name. ~~So yeah that's an unexpected behaviour/bug indeed.~~ Actually I'm not sure how to interpret this yet because as I said early the canary backend, even though configured through ingress - is for backend/service.

You can avoid this by deploying completely-different.com in a different namespace.

ElvinEfendi on 26 Jul 2019

I would say, the better way would be to move all canary logic to nginx.conf. The split_clients with internal named locations should do a better job. It would be much less confusing and implementation all annotations should work.

distol on 26 Jul 2019

You can avoid this by deploying completely-different.com in a different namespace.

No, you can't point ingress from one ns to svc in another.

distol on 26 Jul 2019

No, you can't point ingress from one ns to svc in another.

True, I meant deploying all things separately but that also means duplication.

ElvinEfendi on 26 Jul 2019

Actually I'm not sure how to interpret this yet because as I said early the canary backend, even though configured through ingress - is for backend/service

I think that should be rethinked also. This is far from straightforward. Have you considered using split_clients and named internal locations for all the canary logic?

distol on 26 Jul 2019

Have you considered using split_clients and named internal locations for all the canary logic?

No.
Can you please open a PR with a KEP to revise this feature?

aledbf on 26 Jul 2019

ElvinEfendi on 26 Jul 2019

👍2

I think fundamental issue is that canary is configured based on host and path combination but when it is actually applied to a request, we use proxy_upstream_name, which is based on namespace, service and port. This is the problem.

100%

distol on 26 Jul 2019

Regarding to using split_clients, I feel like that's an implementation detail (also I think that's not as flexible to support all the features we offer at the moment). The fundamental issue I mentioned above will probably still stay. Instead we should make sure canary configuration during controller sync and then application during request processing is based on the same invariant.

ElvinEfendi on 26 Jul 2019

👍1

To add to https://github.com/kubernetes/ingress-nginx/issues/4368#issuecomment-515574775, the issue goes beyond just canary. In the example above if you ignore canary ingress and try to configure i.e load-balance annotation in ingress a, load-balance annotation of ingress b might override that.

ElvinEfendi on 26 Jul 2019

So, back to #4367:

it fixes all this not-so-really-expected behavior with ingresses affecting other ingresses pointing to the same services;
it fixes service_name and service_port variables,
and it also sets the ingress_name variable to the name of ingress, which was the configuration source for the upstream, and this may not be 100% right, but for me is more right, than wrong – at least should not confuse users.

distol on 26 Jul 2019

4367 is making some fundamental changes: it changes how we share a given backend among different front-ends. Now given backend can be shared between any ingress, host, path combination in a given namespace, whereas with that PR we will be sharing the backend only per ingress. This will have unexpected consequences such as you will not be able to share the same backend between two ingresses which is a really common setup for our users where they configure different paths with different ingresses but same host and service.

If we are going down this path, we might as well generate backend per front-end (a.k.a host+path) in a given namespace. That means adding host+path to the backend name, not the ingress name.

ElvinEfendi on 26 Jul 2019

If we are going down this path, we might as well generate backend per front-end (a.k.a host+path) in a given namespace. That means adding host+path to the backend name, not the ingress name.

Well... are we?

zuzzas on 26 Jul 2019

Well... are we?

No without understanding the impact of such change.

aledbf on 26 Jul 2019

Thank you very much for the constructive discussion. And thank you for all the work done on ingress – we have Kubernetes clusters for nearly a hundred of clients, and all of them smoothly use Nginx ingress controller.

I hope that now, understanding the problem, the right solution will be found.

P.S. I can't promise, but I'll try to find time to write KEP about split_clients.

distol on 26 Jul 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 24 Oct 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 23 Nov 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 24 Dec 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.