NGINX Ingress controller version:
v0.17.1 and v0.18.0
Kubernetes version (use kubectl version):
v1.10.5 and v1.10.6
Environment:
uname -a): 4.15.0-1013-azureWhat happened:
The grpc request was successfully proxied to the backend server (written in Go), which returned data back to nginx, but then the response to the client was rpc error: code = Internal desc = server closed the stream without sending trailers.
If I use kubectl port-forward and WithInsecure, the gRPC request works perfectly.
198.217.127.184 - [198.217.127.184] - - [21/Aug/2018:05:09:58 +0000] "POST /keywordspb.KeywordsService/BatchGetKeywords HTTP/1.1" 200 120 "-" "/-2ef21c322cee29c3de4c2b0fe29f93b9faecb33d grpc-go/1.14.0" 512 0.069 [keywords-keywords-grpc] 10.0.2.211:8888 119 0.068 200 05c7736fee61324d03570abf6338efb0
What you expected to happen:
nginx to terminate SSL and then forward the insecure gRPC request to the upstream server, then successfully return the response
How to reproduce it (as minimally and precisely as possible):
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: keywords
name: keywords
labels:
app: keywords
annotations:
kubernetes.io/ingress.class: "nginx"
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/grpc-backend: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- secretName: keywords-tls
hosts:
- keywords.company.app
rules:
- host: keywords.company.app
http:
paths:
- backend:
serviceName: keywords
servicePort: grpc
@derekperkins please update to 0.18.0. This should be fixed in nginx 0.15 (https://nginx.googlesource.com/nginx/+/c9190ce34eebd8a67a70e096ae771b934cbe85ac)
@aledbf Thanks for the quick response. I upgraded to 0.18.0 and got the same error.
If it's helpful, here's the relevant section of the generated nginx.conf file, and I've commented on the 3 lines that changed after upgrading from 0.17.1 to 0.18.0.
## start server keywords.company.app
server {
server_name keywords.company.app ;
listen 80;
listen [::]:80;
set $proxy_upstream_name "-";
listen 443 ssl http2;
listen [::]:443 ssl http2;
# PEM sha: 9246c9e0ff992e5cbc157cced35e9107b811c7a5
ssl_certificate /etc/ingress-controller/ssl/keywords-keywords-tls.pem;
ssl_certificate_key /etc/ingress-controller/ssl/keywords-keywords-tls.pem;
ssl_trusted_certificate /etc/ingress-controller/ssl/keywords-keywords-tls-full-chain.pem;
ssl_stapling on;
ssl_stapling_verify on;
location / {
set $namespace "keywords";
set $ingress_name "keywords";
set $service_name "keywords";
set $service_port "grpc";
set $location_path "/";
rewrite_by_lua_block {
## 0.17.1 there was no balancer.rewrite()
balancer.rewrite()
}
log_by_lua_block {
## 0.17.1 there was no balancer.log()
balancer.log()
monitor.call()
}
if ($scheme = https) {
more_set_headers "Strict-Transport-Security: max-age=15724800; includeSubDomains";
}
port_in_redirect off;
set $proxy_upstream_name "keywords-keywords-grpc";
# enforce ssl on server side
if ($redirect_to_https) {
return 308 https://$best_http_host$request_uri;
}
client_max_body_size "1m";
grpc_set_header Host $best_http_host;
# Pass the extracted client certificate to the backend
# Allow websocket connections
grpc_set_header Upgrade $http_upgrade;
grpc_set_header Connection $connection_upgrade;
grpc_set_header X-Request-ID $req_id;
grpc_set_header X-Real-IP $the_real_ip;
grpc_set_header X-Forwarded-For $the_real_ip;
grpc_set_header X-Forwarded-Host $best_http_host;
grpc_set_header X-Forwarded-Port $pass_port;
grpc_set_header X-Forwarded-Proto $pass_access_scheme;
grpc_set_header X-Original-URI $request_uri;
grpc_set_header X-Scheme $pass_access_scheme;
# Pass the original X-Forwarded-For
grpc_set_header X-Original-Forwarded-For $http_x_forwarded_for;
# mitigate HTTPoxy Vulnerability
# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
grpc_set_header Proxy "";
# Custom headers to proxied server
grpc_set_header X-Request-Start "t=${msec}";
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffering "off";
proxy_buffer_size "4k";
proxy_buffers 4 "4k";
proxy_request_buffering "on";
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_tries 3;
## 0.17.1 grpc_pass grpc://keywords-keywords-grpc;
grpc_pass grpc://upstream_balancer;
proxy_redirect off;
}
}
## end server keywords.company.app
@derekperkins can you post the ingress controller pod log?
I turned the log level up to info and didn't get much more detail. Is there something specific from the logs or a file from the pod you'd like to see?
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:11 [info] 6812#6812: *376281 client closed connection while SSL handshaking, client: 10.0.2.47, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:11 [info] 6813#6813: *376282 client closed connection while waiting for request, client: 10.0.2.47, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:11 [info] 6875#6875: *376387 client 127.0.0.1 closed keepalive connection
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:12 [info] 6850#6850: *376299 client closed connection while SSL handshaking, client: 10.0.0.4, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:12 [info] 6851#6851: *376300 client closed connection while waiting for request, client: 10.0.0.4, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:13 [info] 6815#6815: *376317 client closed connection while SSL handshaking, client: 10.0.0.115, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:13 [info] 6846#6846: *376302 client closed connection while waiting for request, client: 10.0.2.158, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:13 [info] 6852#6852: *376318 client closed connection while SSL handshaking, client: 10.0.2.158, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:14 [info] 6810#6810: *376335 client closed connection while waiting for request, client: 10.0.0.115, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:14 [info] 6804#6804: *376303 client closed connection while SSL handshaking, client: 10.0.1.192, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:14 [info] 6812#6812: *376336 client closed connection while waiting for request, client: 10.0.1.192, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:14 [info] 6803#6803: *376407 client 127.0.0.1 closed keepalive connection
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 82.102.30.77 - [82.102.30.77] - - [21/Aug/2018:12:06:15 +0000] "POST /keywordspb.KeywordsService/BatchGetKeywords HTTP/1.1" 200 120 "-" "/-2ef21c322cee29c3de4c2b0fe29f93b9faecb33d grpc-go/1.14.0" 506 0.069 [keywords-keywords-grpc] 10.0.0.182:8888 119 0.072 200 db7d333aeca8363197a7c7e4274b424a
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:15 [info] 6846#6846: *376337 client closed connection while waiting for request, client: 10.0.1.81, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:15 [info] 6847#6847: *376336 client closed connection while SSL handshaking, client: 10.0.1.81, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:16 [info] 6816#6816: *376370 client closed connection while SSL handshaking, client: 10.0.2.47, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:16 [info] 6812#6812: *376371 client closed connection while waiting for request, client: 10.0.2.47, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:17 [info] 6835#6835: *376355 client closed connection while SSL handshaking, client: 10.0.0.4, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:17 [info] 6810#6810: *376389 client closed connection while waiting for request, client: 10.0.0.4, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:18 [info] 6846#6846: *376372 client closed connection while SSL handshaking, client: 10.0.0.115, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:18 [info] 6875#6875: *376406 client closed connection while waiting for request, client: 10.0.2.158, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-gzkfj nginx-ingress-controller 2018/08/21 12:06:18 [info] 6853#6853: *376407 client closed connection while SSL handshaking, client: 10.0.2.158, server: 0.0.0.0:443
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:19 [info] 6805#6805: *376389 client closed connection while waiting for request, client: 10.0.0.115, server: 0.0.0.0:80
nginx-nginx-ingress-controller-b88567dc6-mnxwf nginx-ingress-controller 2018/08/21 12:06:19 [info] 6846#6846: *376390 client closed connection while waiting for request, client: 10.0.1.192, server: 0.0.0.0:80
@derekperkins please replace the grpc service with the example from the docs https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/grpc
Just to make sure there is no other issues here
I've been debugging this all day and haven't made any progress either on the example or on my app. The example with grpcurl is still erroring out, but with a different error.
client side error
Error invoking method "build.stack.fortune.FortuneTeller/Predict": failed to query for service descriptor "build.stack.fortune.FortuneTeller": rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR
nginx controller log
"POST /build.stack.fortune.FortuneTeller/Predict HTTP/1.1" 200 285 "-" "grpc-node/1.14.1 grpc-c/6.0.0 (osx; chttp2; gladiolus)" 477 0.003 [fortune-fortune-teller-service-grpc] 10.0.2.240:50051 284 0.004 200 2dda30a51fd6189f82ce1d6a3d7263a9
"POST /build.stack.fortune.FortuneTeller/Predict HTTP/1.1"
This HTTP/1.1 is not correct. Are you trying to use https and http2 is enabled in the controller right?
Edit: GRPC support only works over HTTPS in NGINX
I am using https and http2 is enabled. My cert is issued by Let's Encrypt, so at least for my service, I'm authenticating with root certs.
I do have a valid cert and it is loaded for the example service, but I'm not as confident what the grpc example is doing. I'm not seeing any logs in the fortune service and I've never used grpcurl before, so I don't know exactly how that is working in terms of https. To be honest, it was a nightmare trying to get a client to communicate with the example.
I went ahead and just took the proto file and used the generated Go code to match what I'm doing in my own service and got the same outcome.
client side error
rpc error: code = Internal desc = server closed the stream without sending trailers
nginx controller log
"POST /build.stack.fortune.FortuneTeller/Predict HTTP/1.1" 200 115 "-" "grpc-go/1.14.0" 394 0.003 [fortune-fortune-teller-service-grpc] 10.0.2.240:50051 114 0.000 200 9d8cc756b2f1aa6509ddc41efa90ee5a
Here's my code to replicate the client request:
func fortuneTest(c context.Context) error {
creds := credentials.NewTLS(nil)
opt := grpc.WithTransportCredentials(creds)
conn, err := grpc.DialContext(c, "fortune.company.app:443", opt)
if err != nil {
return err
}
defer conn.Close()
client := build_stack_fortune.NewFortuneTellerClient(conn)
fmt.Println("predicting")
req := &build_stack_fortune.PredictionRequest{}
resp, err := client.Predict(c, req)
if err != nil {
fmt.Println("prediction failed")
return err
}
fmt.Println("prediction succeeded!!!")
fmt.Println(resp.Message)
return nil
}
This insecure setup that bypasses nginx via kubectl port-forward works.
func fortuneTest(c context.Context) error {
opt := grpc.WithInsecure()
conn, err := grpc.DialContext(c, "localhost:8111", opt)
if err != nil {
return err
}
defer conn.Close()
client := build_stack_fortune.NewFortuneTellerClient(conn)
req := &build_stack_fortune.PredictionRequest{}
fmt.Println("predicting")
resp, err := client.Predict(c, req)
if err != nil {
fmt.Println("prediction failed")
return err
}
fmt.Println("prediction succeeded!!!")
fmt.Println(resp.Message)
return nil
}
I just spun up a cluster on GKE (v1.10.6-gke.1) to make sure that it wasn't an issue with AKS and got the exact same error message, so it doesn't appear to be an issue at the load balancer level.
The plot is thickening. I edited my hosts file to try routing my grpc requests through port forwarding rather than through the public internet to try and isolate nginx, and that worked as expected. Looking at the two log comparisons that are using the exact same server and client implementations, and something is changing the request from http2 to http1.1 before it gets to nginx.
using kubectl port-forward
127.0.0.1 - [127.0.0.1] - - [22/Aug/2018:17:31:14 +0000] "POST /keywordspb.KeywordsService/BatchGetKeywords HTTP/2.0" 200 53 "-" "/-2ef21c322cee29c3de4c2b0fe29f93b9faecb33d grpc-go/1.14.0" 151 0.005 [keywords-keywords-grpc] 10.4.0.23:8888 95 0.005 200 9567e7aa49644e6ca09bfb97d8e03397
routed via the internet
208.53.50.130 - [208.53.50.130] - - [22/Aug/2018:18:24:31 +0000] "POST /keywordspb.KeywordsService/BatchGetKeywords HTTP/1.1" 200 96 "-" "/-2ef21c322cee29c3de4c2b0fe29f93b9faecb33d grpc-go/1.14.0" 466 0.006 [keywords-keywords-grpc] 10.4.0.23:8888 95 0.006 200 995bed6252170cc16f7f094d6db6a88f
Ok, I feel like an idiot. The issue was that Cloudflare was intercepting the traffic and killing the http2 connection to nginx. Sorry for the noise and thanks for your help.
@derekperkins glad you found the issue, and it wasn't related to the ingress controller.
having the same issue with an HTTPS ALB -> HTTPS NGINX INGRESS -> GRPC (I haven't tested yet HTTPS ALB -> HTTPS NGINX INGRESS -> GRPCS)
I can see that the request from ALB to INGRESS is HTTP/1.1 anyway and not HTTP/2, but the stranger thing is that inspecting the grpc service with grpc reflection works properly while doing a request on the service fails (I can attach some http2 logs from go debug)
I then tested again with HTTPS ALB -> HTTP NGINX INGRESS -> GRPC and same results: reflection works, proper service call not
so my guess is that it's not related to the HTTP version in the request
the "broken" part seems to be the grpc proxy in nginx: may be I need a full tls chain (HTTPS ALB -> HTTPS NGINX INGRESS -> GRPCS). this won't explain anyway why the reflection works
if I do the call without passing from the ingress it works properly with plaintext request
having the same issue with an HTTPS ALB -> HTTPS NGINX INGRESS -> GRPC (I haven't tested yet HTTPS ALB -> HTTPS NGINX INGRESS -> GRPCS)
I can see that the request from ALB to INGRESS is HTTP/1.1 anyway and not HTTP/2, but the stranger thing is that inspecting the grpc service with grpc reflection works properly while doing a request on the service fails (I can attach some http2 logs from go debug)
I then tested again with HTTPS ALB -> HTTP NGINX INGRESS -> GRPC and same results: reflection works, proper service call not
so my guess is that it's not related to the HTTP version in the requestthe "broken" part seems to be the grpc proxy in nginx: may be I need a full tls chain (HTTPS ALB -> HTTPS NGINX INGRESS -> GRPCS). this won't explain anyway why the reflection works
if I do the call without passing from the ingress it works properly with plaintext request
Yea, I ran into this a while back. ALB downgrades all HTTP/2 to HTTP/1.1 . the solution here is to use a tcp listener with an ELB or NLB.
@igaskin yes, I was confirmed by AWS support that the only solution is TCP passthrough via NLB
Most helpful comment
Ok, I feel like an idiot. The issue was that Cloudflare was intercepting the traffic and killing the http2 connection to nginx. Sorry for the noise and thanks for your help.