Ingress-nginx: Incorrect logs and ngx variables when using canary Ingresses

Created on 26 Jul 2019  Â·  30Comments  Â·  Source: kubernetes/ingress-nginx

BUG REPORT

NGINX Ingress controller version: nginx-0.25

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: baremetal
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
  • Kernel (e.g. uname -a): Linux kube-a-1 4.18.0-20-generic #21~18.04.1 SMP Thu May 16 11:54:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm

What happened:
When request is routed to a canary backend, there isn't any indication in ngx variables and, therefore, in logs and in a custom Lua script that I use in a log_by_lua_block.

What you expected to happen:
Proper Ingress name and service name/port should be logged and set in ngx variables when request is routed to a backend that is defined by a canary Ingress.

Anything else we need to know:
The solution seems simple enough: we should provide more information to Lua scripts and modify them a bit. I've opened a PR in which I've attempted a naïve solution to these problems: https://github.com/kubernetes/ingress-nginx/pull/4367.

lifecyclrotten

Most helpful comment

I think it's becoming clear what's the issue: by our canary documentation canary settings gets applied to the ingress _only_ if host and path matches. But you've shown an edge case where that invariant is not hold.

I think fundamental issue is that canary is configured based on host and path combination but when it is actually applied to a request, we use $proxy_upstream_name, which is based on namespace, service and port. This is the problem.

All 30 comments

When request is routed to a canary backend, there isn't any indication in ngx variables and, therefore, in logs and in a custom Lua script that I use in a log_by_lua_block.

Have you tried using $proxy_alternative_upstream_name that's introduced in the most recent release? Using that you can identify whether request got proxied to alternative backend and if so which one.

  1. It does not solve the Ingress name issue. If I create an Ingress I'd like to be certain that it's properly represented in ngx.vars/logs.
  2. It adds a new logic that has to be accounted for in queries in ElasticSearch/Kibana or Grafana/Loki or any other analytical systems that ingest logs. It seems logical to map ngx.vars to upstreams exactly as they are defined in Kubernetes.

@zuzzas can you specify which ngx.vars you are referring to?

$ingress_name, $service_name and $service_port

The way canary configured in ingress-nginx is inherently hacky. Canary is about backend (a.k.a proxy_usptream_name and service and service port), it is not about ingress. The reason why it is configured via Ingress is because that is the only way to do it without introducing a new CRD. Given this I'm not convinced that $ingress_name should change.

For $service_name and $service_port it might make sense to change based on which backend is picked for the request.

BTW, @ElvinEfendi, adding ing.Name to upstreamName, wich is done in #4367, fixes another huge problem.

Without #4367, if you create ingress a that points to service a-svc, then create a-canary that routes 30% of traffic to a-canary-svc and then create ingressb, that points to a-svc – you will get a completely unexpected result – 30% of requests to b (which has no canary) will also be routed to a-svc-canary.

@distol For that to happen ingress a and b have to have exactly the same path and host combination for service a-svc. And when that's the case the ingresses get merged into one internally - only one server/location block will be created for them. And whichever the controller sees first, its annotations will be used and the annotations in the second one will be ignored.

This is not an unexpected result.

Given this I'm not convinced that $ingress_name should change

I was thinking the same way at the beginning, but then convinced myself. But still not 100% sure.

After #4367 upstreams are unique for each ingress, so we can create ingress a pointing to a-svc and ingress a-canary pointing to the same a-svc. Seems not really reasonable scenario, but it can be done. And you can set another load-balancing for the a-canary and it should work. How we can distinguish requests to a and to a-svc?

@distol For that to happen ingress a and b has to have exactly the same path and host combination for service a-svc. And when that's the case the ingresses get merged into one internally - only one server block will be created for them. And whichever the controller sees first, its annotations will be used and the annotations in the second one will be ignored.

No, they shouldn't.

No, they shouldn't.

You mean they should not get merged? What behaviour do you expect?

Sorry, full example (all hapens in ns default with the port 80).

  • Ingress a:

    • host: example.com

    • path: /

    • backend: a-svc

  • Ingress a-canary:

    • host: example.com

    • path: /

    • backend: a-canary-svc

Okay. This two will be merged. And upstream with the name default-a-svc-80 will have alternative backend with the name default-a-canary-svc-80.

But then, we add ingress B:

  • host: completely-different.com
  • path: /some/path
  • backend: a-svc

And this host will use upstream default-a-svc-80, which already has alternative backend... and this is the problem! Because canary configuration for one ingress affects completely unrelated ingress.

Ugh I see. That's because $proxy_upstream_name will be same for both example.com requests as well as completely-different.com reqests. The way we choose alternative backend in balancer.lua is based on $proxy_upstream_name. So yeah that's an unexpected behaviour/bug indeed. Actually I'm not sure how to interpret this yet because as I said early the canary backend, even though configured through ingress - is for backend/service.

You can avoid this by deploying completely-different.com in a different namespace.

I would say, the better way would be to move all canary logic to nginx.conf. The split_clients with internal named locations should do a better job. It would be much less confusing and implementation all annotations should work.

You can avoid this by deploying completely-different.com in a different namespace.

No, you can't point ingress from one ns to svc in another.

No, you can't point ingress from one ns to svc in another.

True, I meant deploying all things separately but that also means duplication.

Actually I'm not sure how to interpret this yet because as I said early the canary backend, even though configured through ingress - is for backend/service

I think that should be rethinked also. This is far from straightforward. Have you considered using split_clients and named internal locations for all the canary logic?

Have you considered using split_clients and named internal locations for all the canary logic?

No.
Can you please open a PR with a KEP to revise this feature?

I think it's becoming clear what's the issue: by our canary documentation canary settings gets applied to the ingress _only_ if host and path matches. But you've shown an edge case where that invariant is not hold.

I think fundamental issue is that canary is configured based on host and path combination but when it is actually applied to a request, we use $proxy_upstream_name, which is based on namespace, service and port. This is the problem.

I think fundamental issue is that canary is configured based on host and path combination but when it is actually applied to a request, we use proxy_upstream_name, which is based on namespace, service and port. This is the problem.

100%

Regarding to using split_clients, I feel like that's an implementation detail (also I think that's not as flexible to support all the features we offer at the moment). The fundamental issue I mentioned above will probably still stay. Instead we should make sure canary configuration during controller sync and then application during request processing is based on the same invariant.

To add to https://github.com/kubernetes/ingress-nginx/issues/4368#issuecomment-515574775, the issue goes beyond just canary. In the example above if you ignore canary ingress and try to configure i.e load-balance annotation in ingress a, load-balance annotation of ingress b might override that.

So, back to #4367:

  1. it fixes all this not-so-really-expected behavior with ingresses affecting other ingresses pointing to the same services;
  2. it fixes service_name and service_port variables,
  3. and it also sets the ingress_name variable to the name of ingress, which was the configuration source for the upstream, and this may not be 100% right, but for me is more right, than wrong – at least should not confuse users.

4367 is making some fundamental changes: it changes how we share a given backend among different front-ends. Now given backend can be shared between any ingress, host, path combination in a given namespace, whereas with that PR we will be sharing the backend only per ingress. This will have unexpected consequences such as you will not be able to share the same backend between two ingresses which is a really common setup for our users where they configure different paths with different ingresses but same host and service.

If we are going down this path, we might as well generate backend per front-end (a.k.a host+path) in a given namespace. That means adding host+path to the backend name, not the ingress name.

If we are going down this path, we might as well generate backend per front-end (a.k.a host+path) in a given namespace. That means adding host+path to the backend name, not the ingress name.

Well... are we?

Well... are we?

No without understanding the impact of such change.

Thank you very much for the constructive discussion. And thank you for all the work done on ingress – we have Kubernetes clusters for nearly a hundred of clients, and all of them smoothly use Nginx ingress controller.

I hope that now, understanding the problem, the right solution will be found.

P.S. I can't promise, but I'll try to find time to write KEP about split_clients.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings