Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
When we are testing high load with vegeta and also we tried different scenarios with load impact, we get considerably worse performance than expected.
We are running 6 nodes m3.xlarge with 10 echoheaders pods with nginx-ingress-controller as a DaemonSet (also we tried as Deployment and the results are same)
We tried with default kube network and also we created cluster with flannel, the results are the same.
These are the results with Vegeta
echo "GET https://domain.com/echo" | vegeta attack -duration=10s -rate=300 | vegeta report
Requests [total, rate] 3000, 300.10
Duration [total, attack, wait] 10.845942141s, 9.996665603s, 849.276538ms
Latencies [mean, 50, 95, 99, max] 533.014592ms, 498.806831ms, 938.375961ms, 1.496382673s, 3.259433647s
Bytes In [total, mean] 1580225, 526.74
Bytes Out [total, mean] 0, 0.00
Success [ratio] 95.27%
Status Codes [code:count] 200:2858 0:142
What you expected to happen:
Latency should be smaller
How to reproduce it (as minimally and precisely as possible):
Cluster created with KOPS
kops create cluster --cloud aws \
--node-count 6 \
--node-size m3.xlarge \
--zones eu-west-1a,eu-west-1b,eu-west-1c \
--master-size m3.large --master-zones eu-west-1a,eu-west-1b,eu-west-1c \
--dns-zone domain.com \
--topology private \
--networking flannel \
--bastion="true" \
--authorization=RBAC
This is my nginx controller config
daemon off;
worker_processes 4;
pid /run/nginx.pid;
worker_rlimit_nofile 130048;
worker_shutdown_timeout 10s ;
events {
multi_accept on;
worker_connections 1048576;
use epoll;
}
http {
real_ip_header proxy_protocol;
real_ip_recursive on;
set_real_ip_from 0.0.0.0/0;
geoip_country /etc/nginx/GeoIP.dat;
geoip_city /etc/nginx/GeoLiteCity.dat;
geoip_proxy_recursive on;
vhost_traffic_status_zone shared:vhost_traffic_status:10m;
vhost_traffic_status_filter_by_set_key $geoip_country_code country::*;
sendfile on;
aio threads;
aio_write on;
tcp_nopush on;
tcp_nodelay on;
log_subrequest on;
reset_timedout_connection on;
keepalive_timeout 75s;
keepalive_requests 100;
client_header_buffer_size 1k;
client_header_timeout 60s;
large_client_header_buffers 4 8k;
client_body_buffer_size 8k;
client_body_timeout 60s;
http2_max_field_size 4k;
http2_max_header_size 16k;
types_hash_max_size 2048;
server_names_hash_max_size 1024;
server_names_hash_bucket_size 64;
map_hash_bucket_size 64;
proxy_headers_hash_max_size 512;
proxy_headers_hash_bucket_size 64;
variables_hash_bucket_size 64;
variables_hash_max_size 2048;
underscores_in_headers off;
ignore_invalid_headers on;
include /etc/nginx/mime.types;
default_type text/html;
gzip on;
gzip_comp_level 5;
gzip_http_version 1.1;
gzip_min_length 256;
gzip_types application/atom+xml application/javascript application/x-javascript application/json application/rss+xml application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/svg+xml image/x-icon text/css text/plain text/x-component;
gzip_proxied any;
# Custom headers for response
server_tokens on;
# disable warnings
uninitialized_variable_warn off;
# Additional available variables:
# $namespace
# $ingress_name
# $service_name
log_format upstreaminfo '$the_real_ip - [$the_real_ip] - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status';
map $request_uri $loggable {
default 1;
}
access_log /var/log/nginx/access.log upstreaminfo if=$loggable;
error_log /var/log/nginx/error.log notice;
resolver 100.64.0.10 valid=30s;
# Retain the default nginx handling of requests without a "Connection" header
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
map $http_x_forwarded_for $the_real_ip {
# Get IP address from Proxy Protocol
default $proxy_protocol_addr;
}
# trust http_x_forwarded_proto headers correctly indicate ssl offloading
map $http_x_forwarded_proto $pass_access_scheme {
default $http_x_forwarded_proto;
'' $scheme;
}
map $http_x_forwarded_port $pass_server_port {
default $http_x_forwarded_port;
'' $server_port;
}
map $http_x_forwarded_host $best_http_host {
default $http_x_forwarded_host;
'' $this_host;
}
map $pass_server_port $pass_port {
443 443;
default $pass_server_port;
}
# Obtain best http host
map $http_host $this_host {
default $http_host;
'' $host;
}
server_name_in_redirect off;
port_in_redirect off;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
# turn on session caching to drastically improve performance
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_session_timeout 10m;
# allow configuring ssl session tickets
ssl_session_tickets on;
# slightly reduce the time-to-first-byte
ssl_buffer_size 4k;
# allow configuring custom ssl ciphers
ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
ssl_prefer_server_ciphers on;
ssl_ecdh_curve auto;
proxy_ssl_session_reuse on;
upstream services-echoheaders-8080 {
# Load balance algorithm; empty for round robin, which is the default
least_conn;
keepalive 32;
server 100.96.8.14:8080 max_fails=0 fail_timeout=0;
server 100.96.8.7:8080 max_fails=0 fail_timeout=0;
server 100.96.8.12:8080 max_fails=0 fail_timeout=0;
server 100.96.8.6:8080 max_fails=0 fail_timeout=0;
server 100.96.8.13:8080 max_fails=0 fail_timeout=0;
server 100.96.8.9:8080 max_fails=0 fail_timeout=0;
server 100.96.8.10:8080 max_fails=0 fail_timeout=0;
server 100.96.8.8:8080 max_fails=0 fail_timeout=0;
server 100.96.8.11:8080 max_fails=0 fail_timeout=0;
server 100.96.8.15:8080 max_fails=0 fail_timeout=0;
}
Environment:
@mpapovic I cannot reproduce this times. Maybe you have some issues in (at least) one of your nodes?
Install the cluster using kops (1.7.1)
export MASTER_ZONES=us-west-2a,us-west-2b,us-west-2c
export WORKER_ZONES=us-west-2a,us-west-2b,us-west-2c
export KOPS_STATE_STORE=s3://k8s-xxxxxx-01
export AWS_DEFAULT_REGION=us-west-2
kops create cluster \
--name uswest2-01.rocket-science.io \
--cloud aws \
--master-zones $MASTER_ZONES \
--zones $WORKER_ZONES \
--master-size m3.medium \
--node-count 6 \
--node-size m3.xlarge \
--ssh-public-key ~/.ssh/id_rsa.pub \
--dns-zone domain.com \
--topology private \
--networking flannel \
--bastion="true" \
--authorization=RBAC \
--dns-zone=uswest2-01.rocket-science.io \
--yes
Install the echo headers deployment
kubectl create -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/docs/examples/http-svc.yaml
kubectl scale deployment http-svc --replicas=10
Create the ingress rule:
echo "
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: http-svc
spec:
rules:
- host: echoheaders.uswest2-01.rocket-science.io
http:
paths:
- backend:
serviceName: http-svc
servicePort: 80
" | kubectl create -f -
Install steps from the deploy guide:
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/namespace.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/default-backend.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/configmap.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/tcp-services-configmap.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/udp-services-configmap.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/rbac.yaml \
| kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/with-rbac.yaml \
| kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/service-l4.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/patch-configmap-l4.yaml
From my laptop:
echo "GET http://echoheaders.uswest2-01.rocket-science.io" | vegeta attack -duration=10s -rate=300 | tee results.bin | vegeta report
Requests [total, rate] 3000, 300.10
Duration [total, attack, wait] 18.514211235s, 9.996665401s, 8.517545834s
Latencies [mean, 50, 95, 99, max] 1.104870547s, 526.198874ms, 4.312779517s, 9.814004164s, 16.699943913s
Bytes In [total, mean] 2058000, 686.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:3000
Error Set:
@aledbf Ive tried same setup on my laptop with minikube the results are 卤identical as yours.
@mpapovic can we close this issue then?
@aledbf, the problem still exist when you build cluster on AWS
@mpapovic please run the test from the bastion host. We have no control how we reach the cluster and the latencies we see from outside. If you run the test from the bastion you should see something like:
$ echo "GET http://echoheaders.uswest2-01.rocket-science.io" | vegeta attack -duration=10s -rate=300 | tee results.bin | vegeta report
Requests [total, rate] 3000, 300.10
Duration [total, attack, wait] 9.99967892s, 9.996665503s, 3.013417ms
Latencies [mean, 50, 95, 99, max] 4.155228ms, 4.015825ms, 5.744591ms, 8.710765ms, 40.174833ms
Bytes In [total, mean] 2058000, 686.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:3000
Error Set:
@aledbf These are the results from bastion
echo "GET https://domain.com/echo" | vegeta attack -duration=10s -rate=300 | vegeta report
Requests [total, rate] 3000, 300.10
Duration [total, attack, wait] 12.263949677s, 9.996665491s, 2.267284186s
Latencies [mean, 50, 95, 99, max] 189.420976ms, 137.12567ms, 596.73114ms, 1.277557503s, 3.490617397s
Bytes In [total, mean] 1636210, 545.40
Bytes Out [total, mean] 0, 0.00
Success [ratio] 98.73%
Status Codes [code:count] 0:38 200:2962
@mpapovic ok, that means you have networking issues in your cluster. This is not related to the ingress controller
Closing. Please use kubernetes-user slack channel to get help.
@aledbf this is the result from vegeta when i test direct on echo pod from other node, same network but without ingress
echo "GET http://100.96.19.70:8080" | ./vegeta attack -duration=10s -rate=300 | ./vegeta report
Requests [total, rate] 3000, 300.10
Duration [total, attack, wait] 9.997550249s, 9.996665412s, 884.837碌s
Latencies [mean, 50, 95, 99, max] 1.355932ms, 961.755碌s, 1.066723ms, 8.64987ms, 75.965425ms
Bytes In [total, mean] 933000, 311.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:3000
If you are sure that ingress is not the problem, can you give me a some hint in which part of the cluster i should look?
@mpapovic you need to test from the same node where the ingress controller is running against the endpoints (content of the upstream server in nginx.conf)
The problem was with SSL termination on ingress when using proxy-protocol=true. Ive changed SSL on AWS LB and now its ok. Your latency was good because your test was on http and mine was https.
Most helpful comment
The problem was with SSL termination on ingress when using proxy-protocol=true. Ive changed SSL on AWS LB and now its ok. Your latency was good because your test was on http and mine was https.