Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): dynamic, backend, upstream, ewma
Similar https://github.com/kubernetes/ingress-nginx/issues/2797 but I am NOT using an external service.
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG
NGINX Ingress controller version: 0.18.0
I understand this is not latest, but 0.19.0 and 0.20.0 are broken in other ways (missing Prometheus metrics). Looking at the changelog for these versions I don't see anything around fixes for missing backends.
Kubernetes version (use kubectl version): 1.10.9
Environment:
uname -a): 4.18 (local) Kubernetes nodes no sureWhat happened:
Not all backends are being balanced too with Dynamic Configuration enabled. I have tried with round_robin and ewma.
What you expected to happen:
All backends to receive traffic.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know:
After doing verbose logging I can see all of my backends are being 'seen', just not used. As Dynamic configuration is all in LUA troubleshooting past this point is pretty much impossible.
I1025 09:05:41.105222 6 endpoints.go:120] Endpoints found for Service "cloud-lte1/hybris-storefront":
[{10.80.138.4 8081 0 0 &ObjectReference{Kind:Pod,Namespace:cloud-lte1,Name:hybris-storefront-74cf495467-gcbpp,UID:7f4f8857-d814-11e8-9af9-06463cbe1d92,APIVersion:,ResourceVersion:2903526,FieldPath:,}}
{10.80.202.4 8081 0 0 &ObjectReference{Kind:Pod,Namespace:cloud-lte1,Name:hybris-storefront-74cf495467-gtl4z,UID:547b2e00-d812-11e8-9af9-06463cbe1d92,APIVersion:,ResourceVersion:2898420,FieldPath:,}}
{10.80.210.2 8081 0 0 &ObjectReference{Kind:Pod,Namespace:cloud-lte1,Name:hybris-storefront-74cf495467-8cbph,UID:b2416464-d80f-11e8-9af9-06463cbe1d92,APIVersion:,ResourceVersion:2892367,FieldPath:,}}
{10.80.210.3 8081 0 0 &ObjectReference{Kind:Pod,Namespace:cloud-lte1,Name:hybris-storefront-74cf495467-br9tt,UID:b24446e6-d80f-11e8-9af9-06463cbe1d92,APIVersion:,ResourceVersion:2892473,FieldPath:,}}]

Reverting back to - --enable-dynamic-configuration=false and least_conn balancing everything works fine.

The graphs indicate CPU usage on the backend. With Dynamic Configuration only two ever get load. Without much more get load and the HPA starts scaling up as expected.
Slightly concerned by the fact Dynamic Configuration will be mandatory in the next release....
I have since tried this with version 0.20.0 and the balancing behaviour seems very strange still. Someones only one backend never gets traffic, sometimes 2 or 3. Seems to be no pattern to it
@rlees85 given you get the above uneven load balancing, can you provide your Nginx configuration, output of kubectl get pods -owide for your app and output of kubectl exec <an ingress nginx pod> -n <namespace where ingress-nginx is deployed> -- curl -s localhost:18080/configuration/backends | jq .[]
Also are you seeing any Nginx error/warning in the logs when this happens?
--
I can not reproduce this, as you can see all 1000 requests are distributed almost evenly across all 10 available replicas.
> ingress-nginx (master)$ ruby count.rb
my-echo-579c44c48f-b5ffz => 99
my-echo-579c44c48f-dvgtk => 102
my-echo-579c44c48f-pzfx6 => 99
my-echo-579c44c48f-r4w2w => 99
my-echo-579c44c48f-xxc9h => 101
my-echo-579c44c48f-rvh48 => 101
my-echo-579c44c48f-v8zh6 => 101
my-echo-579c44c48f-kjxt5 => 99
my-echo-579c44c48f-slhhd => 99
my-echo-579c44c48f-sqpzg => 100
> ingress-nginx (master)$
> ingress-nginx (master)$ k get pods
NAME READY STATUS RESTARTS AGE
my-echo-579c44c48f-b5ffz 1/1 Running 0 50m
my-echo-579c44c48f-dvgtk 1/1 Running 0 31m
my-echo-579c44c48f-kjxt5 1/1 Running 0 31m
my-echo-579c44c48f-pzfx6 1/1 Running 0 50m
my-echo-579c44c48f-r4w2w 1/1 Running 0 31m
my-echo-579c44c48f-rvh48 1/1 Running 0 31m
my-echo-579c44c48f-slhhd 1/1 Running 0 31m
my-echo-579c44c48f-sqpzg 1/1 Running 0 50m
my-echo-579c44c48f-v8zh6 1/1 Running 0 31m
my-echo-579c44c48f-xxc9h 1/1 Running 0 31m
This is using 0.180 and Round Robin. I could not reproduce this with latest master either.
Thanks for the response! The extra debug step to show the backends is going to be really useful. I'm away at the moment but going to get all the requested information on Monday.
With a bit of luck I'll have just done something stupid, which if that is the case I will give details and close.
I've re-setup this environment and am still having problems.
curl -s localhost:18080/configuration/backends from nginx:
{
"name": "cloud-dt1-hybris-storefront-8081",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"name": "hybris-http",
"protocol": "TCP",
"port": 8081,
"targetPort": 8081
}
],
"selector": {
"app.kubernetes.io/instance": "storefront",
"app.kubernetes.io/name": "hybris",
"app.kubernetes.io/part-of": "hybris"
},
"clusterIP": "10.80.4.10",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 8081,
"secure": false,
"secureCACert": {
"secret": "",
"caFilename": "",
"pemSha": ""
},
"sslPassthrough": false,
"endpoints": [
{
"address": "10.80.148.2",
"port": "8081",
"maxFails": 0,
"failTimeout": 0
},
{
"address": "10.80.184.2",
"port": "8081",
"maxFails": 0,
"failTimeout": 0
},
{
"address": "10.80.236.2",
"port": "8081",
"maxFails": 0,
"failTimeout": 0
},
{
"address": "10.80.236.3",
"port": "8081",
"maxFails": 0,
"failTimeout": 0
}
],
"sessionAffinityConfig": {
"name": "cookie",
"cookieSessionAffinity": {
"name": "route",
"hash": "sha1",
"locations": {
"_": [
"/"
]
}
}
}
}
{
"name": "upstream-default-backend",
"service": {
"metadata": {
"creationTimestamp": null
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 80,
"targetPort": 8080
}
],
"selector": {
"app.kubernetes.io/instance": "storefront",
"app.kubernetes.io/name": "default-http-backend",
"app.kubernetes.io/part-of": "nginx"
},
"clusterIP": "10.80.18.162",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
},
"port": 0,
"secure": false,
"secureCACert": {
"secret": "",
"caFilename": "",
"pemSha": ""
},
"sslPassthrough": false,
"endpoints": [
{
"address": "10.80.156.7",
"port": "8080",
"maxFails": 0,
"failTimeout": 0
}
],
"sessionAffinityConfig": {
"name": "",
"cookieSessionAffinity": {
"name": "",
"hash": ""
}
}
}
Get Pods (restricted to namespace - as this is how I am running nginx in namespace restricted mode)
NAME READY STATUS RESTARTS AGE IP NODE
default-http-backend-backoffice-847c84b95f-jq9hn 1/1 Running 0 1h 10.80.144.10 ip-10-81-124-154.eu-west-1.compute.internal
default-http-backend-staging-cc964d9bf-mxvl6 1/1 Running 0 1h 10.80.144.9 ip-10-81-124-154.eu-west-1.compute.internal
default-http-backend-storefront-98fc778d4-hlf6z 1/1 Running 0 1h 10.80.156.7 ip-10-81-125-145.eu-west-1.compute.internal
hybris-backoffice-566fd6fc76-4csjs 1/1 Running 0 1h 10.80.184.3 ip-10-81-123-112.eu-west-1.compute.internal
hybris-storefront-7f9c64c9f8-9ks8j 1/1 Running 0 1h 10.80.236.3 ip-10-81-124-178.eu-west-1.compute.internal
hybris-storefront-7f9c64c9f8-c8wxb 1/1 Running 0 1h 10.80.236.2 ip-10-81-124-178.eu-west-1.compute.internal
hybris-storefront-7f9c64c9f8-q6rfc 1/1 Running 0 1h 10.80.148.2 ip-10-81-123-102.eu-west-1.compute.internal
hybris-storefront-7f9c64c9f8-zhrfq 1/1 Running 0 1h 10.80.184.2 ip-10-81-123-112.eu-west-1.compute.internal
nginx-backoffice-7467f64f7d-tb7kl 1/1 Running 0 1h 10.80.144.17 ip-10-81-124-154.eu-west-1.compute.internal
nginx-staging-59b7bc79d6-mtnq6 2/2 Running 0 1h 10.80.144.16 ip-10-81-124-154.eu-west-1.compute.internal
nginx-storefront-668f646b69-wkqf5 2/2 Running 0 1h 10.80.144.12 ip-10-81-124-154.eu-west-1.compute.internal
qas-5968567cc8-q5rfb 1/1 Running 0 1h 10.80.156.4 ip-10-81-125-145.eu-west-1.compute.internal
solrcloud-0 1/1 Running 1 1h 10.80.160.3 ip-10-81-124-241.eu-west-1.compute.internal
zookeeper-0 1/1 Running 0 1h 10.80.168.5 ip-10-81-123-244.eu-west-1.compute.internal
Ignore Hybris Backoffice, that is handled by a separate ingress.
nginx.conf (some bits omitted with the word omitted)
# Configuration checksum: 15209792060549857848
# setup custom paths that do not require root access
pid /tmp/nginx.pid;
daemon off;
worker_processes 2;
worker_rlimit_nofile 523264;
worker_shutdown_timeout 10s ;
events {
multi_accept on;
worker_connections 16384;
use epoll;
}
http {
lua_package_cpath "/usr/local/lib/lua/?.so;/usr/lib/lua-platform-path/lua/5.1/?.so;;";
lua_package_path "/etc/nginx/lua/?.lua;/etc/nginx/lua/vendor/?.lua;/usr/local/lib/lua/?.lua;;";
lua_shared_dict configuration_data 5M;
lua_shared_dict locks 512k;
lua_shared_dict balancer_ewma 1M;
lua_shared_dict balancer_ewma_last_touched_at 1M;
lua_shared_dict sticky_sessions 1M;
init_by_lua_block {
require("resty.core")
collectgarbage("collect")
local lua_resty_waf = require("resty.waf")
lua_resty_waf.init()
-- init modules
local ok, res
ok, res = pcall(require, "configuration")
if not ok then
error("require failed: " .. tostring(res))
else
configuration = res
configuration.nameservers = { "10.80.0.10" }
end
ok, res = pcall(require, "balancer")
if not ok then
error("require failed: " .. tostring(res))
else
balancer = res
end
ok, res = pcall(require, "monitor")
if not ok then
error("require failed: " .. tostring(res))
else
monitor = res
end
}
init_worker_by_lua_block {
balancer.init_worker()
}
real_ip_header proxy_protocol;
real_ip_recursive on;
set_real_ip_from 0.0.0.0/0;
geoip_country /etc/nginx/geoip/GeoIP.dat;
geoip_city /etc/nginx/geoip/GeoLiteCity.dat;
geoip_org /etc/nginx/geoip/GeoIPASNum.dat;
geoip_proxy_recursive on;
aio threads;
aio_write on;
tcp_nopush on;
tcp_nodelay on;
log_subrequest on;
reset_timedout_connection on;
keepalive_timeout 75s;
keepalive_requests 100;
client_body_temp_path /tmp/client-body;
fastcgi_temp_path /tmp/fastcgi-temp;
proxy_temp_path /tmp/proxy-temp;
ajp_temp_path /tmp/ajp-temp;
client_header_buffer_size 1k;
client_header_timeout 60s;
large_client_header_buffers 4 8k;
client_body_buffer_size 8k;
client_body_timeout 60s;
http2_max_field_size 4k;
http2_max_header_size 16k;
types_hash_max_size 2048;
server_names_hash_max_size 1024;
server_names_hash_bucket_size 32;
map_hash_bucket_size 64;
proxy_headers_hash_max_size 512;
proxy_headers_hash_bucket_size 64;
variables_hash_bucket_size 128;
variables_hash_max_size 2048;
underscores_in_headers off;
ignore_invalid_headers on;
limit_req_status 503;
include /etc/nginx/mime.types;
default_type text/html;
gzip on;
gzip_comp_level 5;
gzip_http_version 1.1;
gzip_min_length 256;
gzip_types application/atom+xml application/javascript application/x-javascript application/json application/rss+xml application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/svg+xml image/x-icon text/css text/plain text/x-component;
gzip_proxied any;
gzip_vary on;
# Custom headers for response
add_header X-Content-Type-Options "nosniff";
server_tokens on;
# disable warnings
uninitialized_variable_warn off;
# Additional available variables:
# $namespace
# $ingress_name
# $service_name
# $service_port
log_format upstreaminfo '$the_real_ip - [$the_real_ip] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';
map $request_uri $loggable {
default 1;
}
access_log /var/log/nginx/access.log upstreaminfo if=$loggable;
error_log /var/log/nginx/error.log notice;
resolver 10.80.0.10 valid=30s;
# Retain the default nginx handling of requests without a "Connection" header
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
map $http_x_forwarded_for $the_real_ip {
# Get IP address from Proxy Protocol
default $proxy_protocol_addr;
}
# trust http_x_forwarded_proto headers correctly indicate ssl offloading
map $http_x_forwarded_proto $pass_access_scheme {
default $http_x_forwarded_proto;
'' $scheme;
}
# validate $pass_access_scheme and $scheme are http to force a redirect
map "$scheme:$pass_access_scheme" $redirect_to_https {
default 0;
"http:http" 1;
"https:http" 1;
}
map $http_x_forwarded_port $pass_server_port {
default $http_x_forwarded_port;
'' $server_port;
}
map $pass_server_port $pass_port {
443 443;
default $pass_server_port;
}
# Obtain best http host
map $http_host $this_host {
default $http_host;
'' $host;
}
map $http_x_forwarded_host $best_http_host {
default $http_x_forwarded_host;
'' $this_host;
}
# Reverse proxies can detect if a client provides a X-Request-ID header, and pass it on to the backend server.
# If no such header is provided, it can provide a random value.
map $http_x_request_id $req_id {
default $http_x_request_id;
"" $request_id;
}
server_name_in_redirect off;
port_in_redirect off;
ssl_protocols TLSv1.2;
# turn on session caching to drastically improve performance
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_session_timeout 10m;
# allow configuring ssl session tickets
ssl_session_tickets on;
# slightly reduce the time-to-first-byte
ssl_buffer_size 4k;
# allow configuring custom ssl ciphers
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
ssl_prefer_server_ciphers on;
ssl_ecdh_curve auto;
proxy_ssl_session_reuse on;
# Custom code snippet configured in the configuration configmap
# Map to determine which site is requested
map $http_host $site_name {
hostnames;
default default;
uk.* omitted_uk;
de.* omitted_de;
us.* omitted_us;
}
# Map to determine if a naked domain was requested
map $http_host $naked_domain {
hostnames;
default 0;
}
# Map to determine if a remote IP is whitelisted from the US site redirect
map $remote_addr $is_whitelisted {
default 0;
1.2.3.4 1;
}
# Map to determine if a user-agent is whitelisted from the US site redirect
map $http_user_agent $is_bot {
default 0;
~*bot 1;
}
# Map to determine if the country code is to be classed as USA
map $country_code $client_is_usa {
default 0;
US 1;
}
upstream upstream_balancer {
server 0.0.0.1; # placeholder
balancer_by_lua_block {
balancer.balance()
}
keepalive 32;
}
## start server _
server {
server_name _ ;
listen 80 proxy_protocol default_server reuseport backlog=511;
listen [::]:80 proxy_protocol default_server reuseport backlog=511;
set $proxy_upstream_name "-";
listen 443 proxy_protocol default_server reuseport backlog=511 ssl http2;
listen [::]:443 proxy_protocol default_server reuseport backlog=511 ssl http2;
# PEM sha: fb0f1b98f7fe4fbbf65d07f832f02f90aaf41ef6
ssl_certificate /etc/ingress-controller/ssl/cloud-dt1-hybris.pem;
ssl_certificate_key /etc/ingress-controller/ssl/cloud-dt1-hybris.pem;
location / {
set $namespace "cloud-dt1";
set $ingress_name "hybris-storefront-ingress";
set $service_name "hybris-storefront";
set $service_port "8081";
set $location_path "/";
rewrite_by_lua_block {
balancer.rewrite()
}
log_by_lua_block {
balancer.log()
monitor.call()
}
if ($scheme = https) {
more_set_headers "Strict-Transport-Security: max-age=15724800";
}
port_in_redirect off;
set $proxy_upstream_name "cloud-dt1-hybris-storefront-8081";
# enforce ssl on server side
if ($redirect_to_https) {
return 308 https://$best_http_host$request_uri;
}
client_max_body_size "1m";
proxy_set_header Host $best_http_host;
# Pass the extracted client certificate to the backend
# Allow websocket connections
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Request-ID $req_id;
proxy_set_header X-Real-IP $the_real_ip;
proxy_set_header X-Forwarded-For $the_real_ip;
proxy_set_header X-Forwarded-Host $best_http_host;
proxy_set_header X-Forwarded-Port $pass_port;
proxy_set_header X-Forwarded-Proto $pass_access_scheme;
proxy_set_header X-Original-URI $request_uri;
proxy_set_header X-Scheme $pass_access_scheme;
# Pass the original X-Forwarded-For
proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;
# mitigate HTTPoxy Vulnerability
# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
proxy_set_header Proxy "";
# Custom headers to proxied server
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffering "off";
proxy_buffer_size "8k";
proxy_buffers 4 "8k";
proxy_request_buffering "on";
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_tries 3;
proxy_pass http://upstream_balancer;
proxy_redirect off;
}
# health checks in cloud providers require the use of port 80
location /healthz {
access_log off;
return 200;
}
# this is required to avoid error if nginx is being monitored
# with an external software (like sysdig)
location /nginx_status {
allow 127.0.0.1;
allow ::1;
deny all;
access_log off;
stub_status on;
}
# Custom code snippet configured in the configuration configmap
# Redirect all naked domains to www. subdomain
if ($naked_domain = 1) {
return 308 ${scheme}://www.${host}${request_uri};
}
# Set a variable whether to do the USA redirect or not, with default value being if the client is within the USA
set $do_usa_redirect $client_is_usa;
# If the site requested is the US site, don't do the US redirect (as already there)
if ($site_name = "omitted_us") {
set $do_usa_redirect 0;
}
# Check result of is_whitelisted, if true do not do US redirect
if ($is_whitelisted = 1) {
set $do_usa_redirect 0;
}
# Check result of is_bot, if true do not do US redirect
if ($is_bot = 1) {
set $do_usa_redirect 0;
}
# If X-Forwarded-IP is set do not do the US redirect
if ($http_x_forwarded_ip != "") {
set $do_usa_redirect 0;
}
# If after all above checks we should do the US redirect, then do it (303)
if ($do_usa_redirect = 1) {
return 303 $scheme://www.omitted.com$request_uri;
}
# If the /omitted path is attempted, redirect to homepage
location ~* /omitted.* {
return 308 $scheme://$host/;
}
# Send some upstream responses to a maintainence page
error_page 502 @maintenance;
error_page 503 @maintenance;
error_page 504 @maintenance;
# Special location for sending to a maintainence page
location @maintenance {
return 307 $scheme://omitted/sorry.html;
}
}
## end server _
# default server, used for NGINX healthcheck and access to nginx stats
server {
listen 18080 default_server reuseport backlog=511;
listen [::]:18080 default_server reuseport backlog=511;
set $proxy_upstream_name "-";
location /healthz {
access_log off;
return 200;
}
location /is-dynamic-lb-initialized {
access_log off;
content_by_lua_block {
local configuration = require("configuration")
local backend_data = configuration.get_backends_data()
if not backend_data then
ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
return
end
ngx.say("OK")
ngx.exit(ngx.HTTP_OK)
}
}
location /nginx_status {
set $proxy_upstream_name "internal";
access_log off;
stub_status on;
}
location /configuration {
access_log off;
allow 127.0.0.1;
allow ::1;
deny all;
# this should be equals to configuration_data dict
client_max_body_size "10m";
proxy_buffering off;
content_by_lua_block {
configuration.call()
}
}
location / {
set $proxy_upstream_name "upstream-default-backend";
proxy_pass http://upstream_balancer;
}
}
}
stream {
log_format log_stream [$time_local] $protocol $status $bytes_sent $bytes_received $session_time;
access_log /var/log/nginx/access.log log_stream;
error_log /var/log/nginx/error.log;
# TCP services
# UDP services
}
Additional Information
Before and After from a few cURLs (only using the first two pods, the second two only got hit by the Kubernetes healthcheck twice)


"sessionAffinityConfig": {
"name": "cookie",
"cookieSessionAffinity": {
"name": "route",
"hash": "sha1",
"locations": {
"_": [
"/"
]
}
}
You seem to have session affinity enabled for you app (there's a known load balancing issue with current implementation), is that intentional? When session affinity is enabled load-balance annotation is ignored.
anyone having this issue please try
quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev
We have the same issue and are using the same configs for stickyness. We're testing the above dev fix and we'll let you know how it goes.

Before and after the dev fix for sticky session for 2 pods. Before 16h40 we were using 0.20 and only one of the pod was receiving traffic (besides health checks) and thus having CPU. After 16h40, we are running version dev and traffic is well balanced and so is CPU.
We'll continue to run this dev version for now and see how it behaves for the next few days. Any idea then this will make it to a final/production release ?
Thanks !
Any idea then this will make it to a final/production release ?
The code is already merged in master (the dev image is from master)
The next release is scheduled in approx two weeks.
That is good news.
To give more details on the queries distribution before and after (queries per second on 2 different pods part of the same service)

You can see that the second pod was only getting the health check queries, not the client queries before the dev fix. Afterwards, the queries were well distributed.
What I cannot confirm so far (looking into this as we speak) is if stickyness is still respected by the new code. I'm unsure if there is an automated test in the build that checks if stickyness works properly so I'm checking this manually to be safe.
Cookie-based stickyness still works well.
All incoming connections with a cookie presented are directed to the same pod, as it should be.
Incoming connections without a cookie are load-balanced between the 2 pods, as it should be.
I'll follow-up again after a few days to confirm everything still works well but so far so good !
Great news, yes my stickiness is intentional... Hybris seems to work better if sessions are not passed around nodes any more than they need to be. I didn't know there was a known issue with stickiness, but am happy to know the issue is known and likely fixed.
Most helpful comment
Cookie-based stickyness still works well.
All incoming connections with a cookie presented are directed to the same pod, as it should be.
Incoming connections without a cookie are load-balanced between the 2 pods, as it should be.
I'll follow-up again after a few days to confirm everything still works well but so far so good !