NGINX Ingress controller version:
helm chart -- stable/nginx-ingress:1.22.1
controller.image.tag -- 0.26.1
Kubernetes version (use kubectl version):
v1.12.9-gke.15
Environment:
uname -a):What happened:
This is not curl problem, definitely problem in nginx-ingress and http2
When i'm trying to curl my endpoint
`
Using HTTP2, server supports multi-use
GET /api-console/api-model.json HTTP/2
Host: bs-tangerine-s2.tom.takeoff.com
User-Agent: curl/7.54.0
Accept: /
- Connection state changed (MAX_CONCURRENT_STREAMS updated)!
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0< HTTP/2 200
< server: nginx/1.15.9
< date: Tue, 01 Oct 2019 11:26:53 GMT
< content-type: application/json
< content-length: 596322
< vary: Accept-Encoding
< last-modified: Fri, 27 Sep 2019 15:00:12 GMT
< strict-transport-security: max-age=15724800; includeSubDomains
<
{ [3907 bytes data]
85 582k 85 499k 0 0 186k 0 0:00:03 0:00:02 0:00:01 186k* HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
99 582k 99 579k 0 0 195k 0 0:00:02 0:00:02 --:--:-- 195k- Closing connection 0
- TLSv1.2 (OUT), TLS alert, Client hello (1):
} [2 bytes data]
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
`
What you expected to happen:
Download JSON, works fine when disable http/2 and work with http1.1
How to reproduce it (as minimally and precisely as possible):
Create app and trying to download big JSON file
Anything else we need to know:
As workaround need to disable http/2
@shmykyta please post the ingress controller pod log
@aledbf there are not a lot info, just
- - [01/Oct/2019:11:27:36 +0000] "GET /api-console/api-model.json HTTP/1.1" 200 596338 "-" "curl/7.54.0" 119 1.005 [XXX] [] 172.16.66.72:8080 596338 1.005 200 676e9aaff910d937c63b67d642d9c6ae
And after that timeout 60 sec
@shmykyta ok, maybe you need to adjust the read timeouts
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-timeouts
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
1.005
That means the response was returned in 1 second.
@aledbf
So, i got response in 1 second with error and broken json
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
As you see i get around 200k of 600k and then trasfering stoped
99 582k 99 579k 0 0 195k 0 0:00:02 0:00:02 --:--:-- 195k
In this case timeouts are ok
I suggest you use kubectl exec in the ingress controller pod to execute the curl command (using localhost and -H for Host header)
@aledbf
* Expire in 0 ms for 6 (transfer 0x562f7ca34d30)
* Expire in 1 ms for 1 (transfer 0x562f7ca34d30)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Expire in 10 ms for 1 (transfer 0x562f7ca34d30)
* Expire in 10 ms for 1 (transfer 0x562f7ca34d30)
* Trying xxx.xxx.xxx.xxx...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x562f7ca34d30)
* Connected to XXX-HOST (xxx.xxx.xxx.xxx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [106 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2908 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
* <<< CERTS HERE >>
* start date: Sep 25 00:00:00 2019 GMT
* expire date: Sep 29 12:00:00 2021 GMT
* <<< CERTS HERE >>
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x562f7ca34d30)
} [5 bytes data]
> GET /api-console/api-model.json HTTP/2
> Host: XXX-HOST
> User-Agent: curl/7.64.0
> Accept: */*
>
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200
< server: nginx/1.15.9
< date: Wed, 16 Oct 2019 10:02:52 GMT
< content-type: application/json
< content-length: 596322
< vary: Accept-Encoding
< last-modified: Fri, 27 Sep 2019 15:00:12 GMT
< strict-transport-security: max-age=15724800; includeSubDomains
<
{ [3907 bytes data]
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
99 582k 99 579k 0 0 7070k 0 --:--:-- --:--:-- --:--:-- 7070k
* Connection #0 to host XXX-HOST left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
(Not trying to keep issue alive but) I have the same issue with an old Symfony2 PHP app when trying to download specifically PDFs. The PDFs aren't big, only ~13kb. The issue is occurring between the client and ingress. If I bypass the ingress or send the request over http/1.1 it works fine.
But long term workaround is seemingly to disable http/2. Any solution @shmykyta?
Having the same issue with downloads over http2. I'd love to provide some additional logging but afaik there is none.
It would be great if someone could provide a self-contained example reproducing the issue.
After some research this weekend (Read: Pulling my hair out), I'm not fully convinced yet it has actually something to do with the nginx-ingress controller or nginx in general. I'm suspecting an incorrect (too short) Content-Length header that was previously working due to how forgiving http1.1 is.
The endpoint that we experience this issue is bound to our monolith and providing a self-contained example would result in shipping half our software stack, so that's going to be very hard if not impossible. I've done some stracing in debug mode and so far do not notice any references in the strace output to the nginx ingress controller experiencing the network error the client experiences other than just closing the client after the data has been send. I'll look into it some more, but for now as far as I can conclude this is a non-related error caused by how the different http protocols respond to protocol errors.
(Edit: Typo's)
I am having the same issue, whenever I run a POST request with Curl with a large JSON Body (300kb) the request seems to get timeouted and NGINX writes a 408 response status in the access log. Requests with a smallar JSON body (10kb) at the same target work completely fine. This is the verbose curl output:
curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "Authorization: Bearer $api_key" -d @d.json https://redacted-domain.com/api/dashboards/db -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 23.151.xxx.xxx...
* TCP_NODELAY set
* Connected to redacted-domain.com (23.151.xxx.xxx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=redacted-domain.com
* start date: Feb 18 08:31:58 2020 GMT
* expire date: May 18 08:31:58 2020 GMT
* subjectAltName: host "redacted-domain.com" matched cert's "redacted-domain.com"
* issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fd5bd807400)
> POST /api/dashboards/db HTTP/2
> Host: redacted-domain.com
> User-Agent: curl/7.64.1
> Content-Type: application/json
> Accept: application/json
> Authorization: Bearer redacted
> Content-Length: 278482
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* We are completely uploaded and fine
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
* Connection #0 to host redacted-domain.com left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
After * We are completely uploaded and fine is printed it takes 60s (default client request timeout in nginx ingress) until the rest of the message is printed. Do you have any ideas I could try to resolve the issue? Any config options?
Additional context:
I tried port forwarding the upstream pod (Grafana pod) and sent the curl directly against that pod and the command works completely fine (success response within a few ms).
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Anyone made progress on the cause here?
For me the issue was in the loadbalancer upstream that closed the connection as the idle time was reached, but it can be in several parts of your stack.
if you suspect the ingress controller (which for me wasn't the culprit), try setting the debug logging to be very loud, you'd definately find your answer there :)
For the record, in my case it was a client bug in curl limited to Mac OS builds
With curl built from https://github.com/curl/curl/commit/34a5400de113f2badfaa338b00cc361b4ab8355a the issue no longer occurs.
curl never sent the whole request body but kept hanging consuming CPU time likely looping
This caused 408 and 499 (timeout/Client Closed Request) on the NGINX side
After NGINX's proxy timeout was reached, I got curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
Most helpful comment
For the record, in my case it was a client bug in curl limited to Mac OS builds
With curl built from https://github.com/curl/curl/commit/34a5400de113f2badfaa338b00cc361b4ab8355a the issue no longer occurs.
curl never sent the whole request body but kept hanging consuming CPU time likely looping
This caused 408 and 499 (timeout/Client Closed Request) on the NGINX side
After NGINX's proxy timeout was reached, I got
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)See https://github.com/curl/curl/issues/1410